Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

A study of control systems dynamics and optimization has always needed a suitable mathematical language to formulate the problems, analyse possible behaviours and difficulties, develop algorithms and prove that these algorithms have the desired properties. This is also true of Iterative Control which exhibits all the properties and challenges of classical control problems with the added need to consider the effect of iteration on behaviours. The material in this chapter acts to remind readers of the mathematics needed for analysis of state space models in control theory and the essential structure of quadratic optimization problems. To this mix is added an introduction to the essentials of Hilbert spaces as a representation of signals, as a means of representing dynamical systems as operators on such spaces and as a means of creating a geometric approach to iterative optimization based control. The presentation aims to define both notation and explain concepts. Fuller details and proofs of the statements can be found in the references.

2.1 Elements of Matrix Theory

A \(p \times q\) real (or complex) matrix A is an array of real (or complex) numbers of the form

$$\begin{aligned} A= \left[ \begin{array}{ccccc} A_{11} &{} A_{12} &{} A_{13}&{} \cdots &{} A_{1q} \\ A_{21} &{} A_{22} &{} A_{23}&{} \cdots &{} A_{2q} \\ \vdots &{} &{} &{} &{} \vdots \\ A_{p1} &{} A_{p2} &{} A_{p3}&{} \cdots &{} A_{pq} \\ \end{array} \right] \end{aligned}$$
(2.1)

The element in the ith row and jth column is denoted \(A_{ij}\). If \(q=1\), the matrix is often called a vector. Block matrices can also be defined where the \(A_{ij}\) are \(p_i \times q_j\) sub-matrices. In this case the dimensions of the matrix A are \(\sum _{i=1}^p p_j\times \sum _{j=1}^{q}q_j\).

The following is essential for control theoretical purposes

  1. 1.

    The set of \(p \times 1\) real (respectively complex) vectors is given the symbol \(\mathscr {R}^p\) (respectively \(\mathscr {C}^p\)).

  2. 2.

    A is said to be square if the number of rows is equal to the number of columns.

  3. 3.

    Addition of two \(p \times q\) matrices AB to form a \(p \times q \) matrix C written \(C=A+B\) is defined by the elements

    $$\begin{aligned} C_{ij} = A_{ij}+ B_{ij}, \quad for \quad 1 \le i \le p, \quad 1 \le j \le q. \end{aligned}$$
    (2.2)

    The \(p \times q\) zero matrix is the matrix with all elements equal to zero.

  4. 4.

    Multiplication of A by a scalar \(\lambda \) produces a matrix \(C=\lambda A\) where

    $$\begin{aligned} C_{ij} = \lambda A_{ij}, \quad for \quad 1 \le i \le p,\quad 1 \le j \le q. \end{aligned}$$
    (2.3)
  5. 5.

    Multiplication of a \(p \times q\) matrix A by a \(q \times r\) matrix B to produce a \(p \times r\) matrix \(C=AB\) is defined by the following computation of elements of C

    $$\begin{aligned} C_{ij} = \sum _{k=1}^q A_{ik} B_{kj} ,\quad for\quad 1 \le i \le p,\quad 1 \le j \le r. \end{aligned}$$
    (2.4)
  6. 6.

    The transpose of a \(p \times q\) matrix A is the \(q \times p\) matrix \(A^T\) with

    $$\begin{aligned} (A^T)_{ji} = A_{ij}, \quad for \quad 1 \le i \le p,\quad 1 \le j \le q. \end{aligned}$$
    (2.5)

    The act of taking the transpose of a product satisfies the rule \((AB)^T=B^TA^T\). If \(A=A^T\) then A is said to be symmetric .

  7. 7.

    The conjugate transpose of a complex \(p \times q\) matrix A is the \(q \times p\) matrix \(A^*\) with

    $$\begin{aligned} (A^*)_{ji} = \overline{A}_{ij}, \quad for \quad 1 \le i \le p,\quad 1 \le j \le q, \end{aligned}$$
    (2.6)

    where \(\overline{a}\) denotes the complex conjugate of a. The act of taking the conjugate transpose of a product satisfies the rule \((AB)^*=B^*A^*\). If \(A=A^*\) then A is said to be Hermitian . If A is real then the conjugate transpose is simply the transpose and, if A is Hermitian, it is symmetric.

  8. 8.

    The determinant of a square \(p \times p\) matrix is denoted \(\det [A]\), |A| or,

    $$\begin{aligned} \det [A]= \left| \begin{array}{ccccc} A_{11} &{} A_{12} &{} A_{13}&{} \cdots &{} A_{1p} \\ A_{21} &{} A_{22} &{} A_{23}&{} \cdots &{} A_{2p} \\ \vdots &{} &{} &{} &{} \vdots \\ A_{p1} &{} A_{p2} &{} A_{p3}&{} \cdots &{} A_{pp} \\ \end{array} \right| \end{aligned}$$
    (2.7)

    The determinant has many properties including

    1. a.

      the properties \(\det [A] = \det [A^T]\) and \(\det [\overline{A}] = \overline{det[A]}\).

    2. b.

      If both A and B are square \(p \times p\) matrices, then \(\det [AB] =\det [A]\det [B] = \det [BA]\).

  9. 9.

    A is said to be an injection or one-to-one if the only \(p \times 1\) vector v satisfying the equation \(Av=0\) is the vector \(v=0\). In general, the set of vectors v such that \(Av=0\) is a vector subspace of \(\mathscr {R}^q\) (or \(\mathscr {C}^q\)) and is called the kernel or null space of A and denoted

    $$\begin{aligned} ker[A] = \{ v:Av=0\} \end{aligned}$$
    (2.8)

    The subspace ker[A] is always \(\ne \{0\}\) when \(q >p\).

  10. 10.

    If, for every \(p \times 1\) vector w, there exists a vector v such that \(w=Av\), then A is said to be onto or a surjection . More generally, the set of all vectors w for which there exists a vector v such that \(w=Av\) is called the range of A. It is a vector subspace of \(\mathscr {R}^p\) (or \( \mathscr {C}^p\)) and is denoted by

    $$\begin{aligned} \mathscr {R}[A] = \{ w:w = Av \quad for\,some\,vector~v \} \end{aligned}$$
    (2.9)

    A necessary condition for the range to be equal to \(\mathscr {R}^p\) (or \(\mathscr {C}^p\) as appropriate) is that \(q \ge p\).

  11. 11.

    If A is both a surjection and an injection, it is said to be a bijection (or simply nonsingular). If A is a \(p \times p\) square matrix, then it is a bijection if, and only if, it has non-zero determinant. If \(\det [A]=0\) then A is said to be singular.

  12. 12.

    A \(p \times q\) matrix A is invertible if, and only if, it is a bijection. In particular, this requires that it is square (\(p=q\)) and it is equivalent to the statement that, for every vector w, there exists a unique vector v such that \(w=Av\). The inverse of A is denoted by \(A^{-1}\). It is a square \( p \times p\) matrix of the form

    $$\begin{aligned} A^{-1} = \frac{adj[A]}{\det [A]} \end{aligned}$$
    (2.10)

    where the adjugate matrix adj[A] has elements that are well defined polynomials in the elements of A.

  13. 13.

    For all invertible \(p \times p\) matrices A, the inverse \(A^{-1}\) satisfies the equations

    $$\begin{aligned} AA^{-1}=A^{-1}A = I_p \end{aligned}$$
    (2.11)

    where \(I_p\) denotes the \(p \times p\) unit matrix or identity

    $$\begin{aligned} I_p= \left[ \begin{array}{ccccc} 1 &{} 0 &{} 0 &{} \cdots &{} 0 \\ 0 &{} 1 &{} 0&{} \cdots &{} \\ \vdots &{} &{} &{} &{} \vdots \\ 0 &{} 0 &{} 0 &{} \cdots &{} 1 \\ \end{array} \right] \end{aligned}$$
    (2.12)

    For any \(p \times q \) matrix B and \(q \times p\) matrix C, the properties \(I_pB=B\) and \(CI_p = C\) hold true. Also \(\det [I_p]=1\) and hence \(\det [A^{-1}] = \left( \det [A] \right) ^{-1}\). Note: for notational simplicity, the subscript p on \(I_p\) is sometimes dropped to leave the symbol I. This should cause no confusion as matrix dimensions are usually clear from the context.

  14. 14.

    If A and B are \(p \times p\) nonsingular matrices, then \((AB)^{-1}= B^{-1}A^{-1}\).

  15. 15.

    If T is square and nonsingular, then the mapping \(A \mapsto T^{-1}AT\) is a similarity transformation Both A and \(T^{-1}AT\) have the same eigenvalues and \(\det [A] = \det [T^{-1}AT]\).

  16. 16.

    For non-square \(p \times q\) matrices A, other definitions of inverse play a role in matrix analysis. In particular, if \(p \ge q\), a left inverse B of A is any matrix satisfying the condition \(BA=I_q\). In a similar manner, if \(p \le q\), a right inverse B of A is any matrix satisfying the condition \(AB=I_p\). A left inverse of A exists if, and only if, A has kernel \(\{0\}\) and a right inverse exists, if and only if, \(\mathscr {R}[A] = \mathscr {R}^p\) (or \(\mathscr {C}^p\)). If \(p \ne q\), any left or right inverse is non-unique. If \(p=q\), then they are unique and equal to the inverse \(A^{-1}\). Specific examples of left, respectively right, inverses are given by, respectively,

    $$\begin{aligned} B= (A^*A)^{-1}A^*, \quad B=A^*(AA^*)^{-1}. \end{aligned}$$
    (2.13)
  17. 17.

    Given the definition of the unit matrix, two useful relationships are as follows

    1. a.

      If A and B are, respectively, \(p \times q\) and \(q \times p\), then

      $$\begin{aligned} \det [I_p +AB] = \det [I_q+BA]. \end{aligned}$$
      (2.14)
    2. b.

      If A has the partitioned form

      $$\begin{aligned} A= \left[ \begin{array}{cc} M_{11} &{} M_{12} \\ M_{21} &{} M_{22} \end{array} \right] \end{aligned}$$
      (2.15)

      with \(M_{11}\) square and nonsingular, then Schur’s Formula is valid,

      $$\begin{aligned} \det [A] = \det [M_{11}] \det [M_{22}-M_{21}M_{11}^{-1}M_{12}]. \end{aligned}$$
      (2.16)

The above algebraic properties of matrices are the basis of manipulation. For analysis purposes, a number of other properties and concepts are required and are summarized as follows

  1. 1.

    A finite set \(\{H_j\}_{1 \le j \le M}\) of real (respectively complex) \(p \times q\) matrices is said to be linearly independent if, and only if, the only real (respectively complex) scalars \(\{a_j\}_{1 \le j \le M}\) satisfying the condition \(\sum _{j=1}^M a_jH_j =0\) are \(a_j=0,~ 1 \le j \le M\).

  2. 2.

    The rank of a \(p \times q\) matrix A is the maximum number of linearly independent columns of A regarded as \(p \times 1\) vectors. A \(p \times p\) matrix A is nonsingular if, and only if, it has rank equal to its dimension p.

  3. 3.

    The characteristic polynomial of a square \(p \times p\) matrix A is defined by the determinant

    $$\begin{aligned} \rho (s)= \left| sI_p -A \right| = \sum _{j=0}^p a_{p-j}s^j \quad with~ a_0 =1. \end{aligned}$$
    (2.17)

    It is a polynomial of degree p in the complex variable s with p, possibly complex, roots \(\lambda _j,~ 1 \le j \le p\) called the eigenvalues of A. If A is a real matrix, then the eigenvalues are either real or exist in complex conjugate pairs. More precisely, if \(\lambda \) is an eigenvalue , then its complex conjugate \(\overline{\lambda }\) is also an eigenvalue. The spectral radius of A is defined by

    $$\begin{aligned} r(A) = \max _{1 \le j \le p}|\lambda _j|. \end{aligned}$$
    (2.18)
  4. 4.

    A complex number \(\lambda \) is an eigenvalue of A if, and only if, there exists a non-zero solution vector \(v \in \mathscr {C}^p\) solving the equation

    $$\begin{aligned} Av=\lambda v \end{aligned}$$
    (2.19)

    Such an eigenvector is not uniquely defined as, for example, it can be multiplied by any scalar and still be an eigenvector. If A has p linearly independent eigenvectors \(\{v_j\}_{1 \le j \le p}\) then an eigenvector matrix E of A is defined to be the block matrix \(E=\left[ v_1, v_2, \ldots ,v_p \right] \). It is nonsingular and can be used to diagonalize A using the similarity transformation

    $$\begin{aligned} E^{-1}A E = \left[ \begin{array}{ccccc} \lambda _1 &{} 0 &{} 0 &{} \cdots &{} 0 \\ 0 &{} \lambda _2 &{} 0&{} \cdots &{} \\ \vdots &{} &{} &{} &{} \vdots \\ 0 &{} 0 &{} 0 &{} \cdots &{} \lambda _p \\ \end{array} \right] = diag\left[ \lambda _1, \lambda _2, \ldots , \lambda _p \right] \end{aligned}$$
    (2.20)

    The diagonal matrix produced is often called the diagonal canonical form of A. A always has linearly independent eigenvectors if its p eigenvalues are distinct.

  5. 5.

    As \(|sI-A|=\overline{|\overline{s}I-A^*|}\), the eigenvalues of the conjugate transpose matrix are exactly the complex conjugates of the eigenvalues \(\{\lambda _j\}\) of A. Suppose that the eigenvectors of \(A^*\) are denoted \(w_j,~ 1 \le j \le p\) and that \(A^*w_j = \overline{\lambda }_j w_j\). This can be rewritten in the form \(w_j^*A = \lambda _j w_j^*\) and \(w^*_j\) is termed a left eigenvector of A. If A has p linearly independent eigenvectors and associated eigenvector matrix E, then

    $$\begin{aligned} E^{-1}A = diag[\lambda _1, \lambda _2, \ldots , \lambda _p]E^{-1}. \end{aligned}$$
    (2.21)

    Equating rows of the two sides of the equation indicates that the rows of \(E^{-1}\) are left eigenvectors of A and, as \(E^{-1}E=I\), these left eigenvectors satisfy the conditions

    $$\begin{aligned} w_i^* v_j = \delta _{ij}, \end{aligned}$$
    (2.22)

    the Kronecker delta defined as \(\delta _{ij}=0\) whenever \(i \ne j\) and unity otherwise.

  6. 6.

    If A is Hermitian , then its eigenvalues are all real valued. Its eigenvectors \(v_j\) form a set of p linearly independent vectors. It is always possible to scale these vectors so that they satisfy the orthogonality condition

    $$\begin{aligned} v_i^* v_j = \delta _{ij} \end{aligned}$$
    (2.23)

    Under these circumstances, \(E^{-1}=E^*\) so that \(E^*E=I\), an example of a unitary matrix . If E is real then it is an orthogonal matrix.

  7. 7.

    Almost every square matrix A can be diagonalized in the manner shown above. In some cases, diagonalization is not possible but, in such cases, a Jordan canonical form can be produced. More precisely, there exists an integer q and a nonsingular \(p \times p\) matrix J with the property that

    $$\begin{aligned} J^{-1}A J = \left[ \begin{array}{ccccc} J_1 &{} 0 &{} 0 &{} \cdots &{} 0 \\ 0 &{} J_2 &{} 0&{} \cdots &{} \\ \vdots &{} &{} &{} &{} \vdots \\ 0 &{} 0 &{} 0 &{} \cdots &{} J_q \\ \end{array} \right] = blockdiag\left[ J_1, J_2, \ldots , J_q \right] \end{aligned}$$
    (2.24)

    where each Jordan block \(J_j, 1 \le j \le q\) has the structure of a \(q_j \times q_j\) matrix as follows

    $$\begin{aligned} J_j = \left[ \begin{array}{cccccc} \gamma _j &{} 1 &{} 0 &{} \cdots &{} 0 &{}0 \\ 0 &{} \gamma _j &{} 1&{} \cdots &{} 0 &{} 0 \\ \vdots &{} &{} \vdots &{} &{} &{} \vdots \\ 0 &{} 0 &{} 0 &{} \cdots &{} \gamma _j &{} 1 \\ 0 &{} 0 &{} 0 &{} \cdots &{} 0 &{} \gamma _j \\ \end{array} \right] \end{aligned}$$
    (2.25)

    where each \(\gamma _j\) is an eigenvalue of A.

  8. 8.

    In all cases the determinant of A is computed from the product of all p eigenvalues

    $$\begin{aligned} \det [A]= \prod _{j=1}^p~ \lambda _j. \end{aligned}$$
    (2.26)

    A is singular if, and only if, it has a zero eigenvalue and hence \(ker[A] \ne \{0\}\).

  9. 9.

    In all cases, a square matrix A “satisfies its own characteristic equation”

    $$\begin{aligned} \rho (A) = \sum _{j=0}^p a_{p-j}A^j = 0. \end{aligned}$$
    (2.27)

    This statement is normally known as the Cayley-Hamilton Theorem . The result is the basis of many theoretical simplifications and insight exemplified by the easily proven fact that, if A is nonsingular, then \(a_p \ne 0\) and

    $$\begin{aligned} A^{-1} = - a_p^{-1}\sum _{j=1}^p a_{p-j}A^{j-1}. \end{aligned}$$
    (2.28)

    That is, the inverse can be expressed in terms of powers of A and the coefficients in the characteristic polynomial. Note: A related polynomial is the minimum polynomial of A which is the uniquely defined polynomial \(\rho _{min}(s)\) of minimum degree that has the property \(\rho _{min}(A)=0\). The degree of the minimum polynomial is always less than or equal to p and is always equal to p if the eigenvalues of A are distinct.

  10. 10.

    More generally, the Cayley-Hamilton theorem implies useful facts about functions of matrices . If f(s) is an analytic function of the complex variable s expressible as a power series \(\sum _{j=0}^{\infty }f_j s^j\) with radius of convergence R, then the symbol f( A) denotes the associated function of A defined by

    $$\begin{aligned} f(A) = \sum _{j=0}^{\infty }f_j A^j \end{aligned}$$
    (2.29)

    This series converges whenever the spectral radius \(r(A) < R\). For example,

    1. a.

      the exponential function \(e^s\) has a power series expansion with \(f_j= \frac{1}{j!}\). The corresponding matrix exponential is

      $$\begin{aligned} e^A = \sum _{j=0}^{\infty } \frac{1}{j!}A^j = I+A+\frac{1}{2!}A^2 + \frac{1}{3!}A^3 + \cdots . \end{aligned}$$
      (2.30)
    2. b.

      The function \((1-s)^{-1} = \sum _{j=0}^{\infty } s^j\) has a radius of convergence \(R=1\). It follows that, if the spectral radius \(r(A) <1\), the matrix inverse \((I-A)^{-1}\) exists and has the convergent series expansion

      $$\begin{aligned} (I-A)^{-1} = \sum _{j=0}^{\infty } A^j. \end{aligned}$$
      (2.31)

    If A has a nonsingular eigenvector matrix E, then \(A^j = E~diag[\lambda _1^j, \ldots , \lambda _p^j]E^{-1}\) and

    $$\begin{aligned} f(A) = \sum _{j=0}^{\infty }f_j A^j = E ~diag[f(\lambda _1), \ldots , f(\lambda _m)]E^{-1} \end{aligned}$$
    (2.32)
  11. 11.

    From the Cayley-Hamilton theorem it is easily seen that all powers \(A^j\) with \(j \ge p\) can be expressed as a polynomial in A of degree less than or equal to \(p-1\). It follow that all functions f(A) can be expressed as a polynomial in A of degree less than or equal to \(p-1\) by suitable choice of coefficients.

  12. 12.

    The Spectral Mapping Theorem states that, if A has eigenvalues \(\lambda _j, ~ 1 \le j \le p\), and \(r(A) < R\), then the eigenvalues of f(A) are precisely \(f(\lambda _j), ~ 1 \le j \le p\).

The final group of useful properties are associated with the idea of positivity of quadratic forms

  1. 1.

    Suppose that A is a square, real, symmetric, \(p \times p\) matrix and x an arbitrary \(p \times 1\) vector in \(\mathscr {R}^p\). Then the quadratic function \(x^TAx\) is a quadratic form . If A is not symmetric, it can always be replaced by a symmetric matrix as \(x^TAx \equiv x^T\left( \frac{A+A^T}{2} \right) x\).

  2. 2.

    If A is complex then the quadratic form is defined on \(\mathscr {C}^p\) as \(x^*Ax\). If A is Hermitian, then \(x^*Ax\) takes only real values.

  3. 3.

    A real matrix A is said to be positive if, and only if, \(x^TAx \ge 0\) for all vectors \(x \in \mathscr {R}^p\). If \(x^TAx > 0\) whenever \(x \ne 0\), then A is said to be positive definite and written in the form \(A>0\). If A is positive but not positive definite, it is positive semi-definite and written in the form \(A \ge 0\). The expression \(A \ge B\) (respectively \(A > B\)) is equivalent to \(A-B \ge 0\) (respectively \(A-B >0\)). Similar definitions are used for complex matrices and their associated quadratic forms.

  4. 4.

    Conditions for positivity for real, symmetric matrices include the following

    1. a.

      A real, symmetric matrix A is positive if, and only if, all its eigenvalues satisfy the inequalities \(\lambda _j \ge 0,~ 1 \le j \le p\). It is positive definite if, and only if, all eigenvalues are strictly positive. Positive definite, symmetric matrices are hence always invertible.

    2. b.

      If A and B are matrices and \(A=B^TB\), then A is positive. A is positive definite if, and only if, \(ker[B]=\{0\}\).

    3. c.

      A real, symmetric \(p \times p\) matrix A is positive definite if and only if the Principal Minors

      $$\begin{aligned} \left| \begin{array}{ccccc} A_{11} &{} A_{12} &{} A_{13}&{} \cdots &{} A_{1q} \\ A_{21} &{} A_{22} &{} A_{23}&{} \cdots &{} A_{2q} \\ \vdots &{} &{} &{} &{} \vdots \\ A_{q1} &{} A_{q2} &{} A_{q3}&{} \cdots &{} A_{qq} \\ \end{array} \right| >0, \quad \quad for~1 \le q \le p. \end{aligned}$$
      (2.33)
  5. 5.

    Positive, symmetric, real matrices A always have an orthogonal eigenvector matrix E and can be expressed in the form \(A=Ediag[\lambda _1, \ldots , \lambda _p]E^T\) with \(E^TE=I\).

  6. 6.

    A positive, symmetric, real matrix A has a unique symmetric, positive square root B such that \(B^2=A\). As with scalar square roots, B is usually denoted by the symbol \(A^{\frac{1}{2}}\). It can be computed from the formula

    $$\begin{aligned} A=Ediag[\lambda _1^{\frac{1}{2}}, \ldots , \lambda _p^{\frac{1}{2}}]E^T \end{aligned}$$
    (2.34)

2.2 Quadratic Optimization and Quadratic Forms

2.2.1 Completing the Square

The conceptual basis of much of optimization theory used in control systems algorithms has its origins in the simple ideas of minimization of quadratic functions of vectors in \(\mathscr {R}^p\). This short section explains the basic ideas using a simple example and without the need for advanced mathematical methodologies more complicated than the matrix theory described above. The problem used to illustrate the ideas is the problem of minimizing the quadratic objective function

$$\begin{aligned} J(x) = x^TAx + 2b^Tx +c. \end{aligned}$$
(2.35)

where the \(p \times p\) matrix A is real, symmetric and positive definite, b is real and \(p \times 1\) and c is a real number. The solution is easily found by completing the square and verifying that

$$\begin{aligned} J(x) = (x+A^{-1}b)^TA(x+A^{-1}b) - b^TA^{-1}b +c. \end{aligned}$$
(2.36)

The second two terms are independent of x. The fact that A is positive definite immediately yields the fact that the minimum value occurs when the first term is zero. The unique minimizing solution is hence

$$\begin{aligned} x_{\infty }=-A^{-1}b \quad and \quad J(x_{\infty }) = - b^TA^{-1}b +c. \end{aligned}$$
(2.37)

Both can be computed using standard software if the matrices involved are of reasonable dimension and not ill-conditioned. Factors causing problems include:

  1. 1.

    Suppose that the eigenvalues of A are listed in order of ascending value \(\lambda _1 \le \lambda _2 \le \cdots \le \lambda _p\) and A is written in its diagonal form \(A=Ediag[\lambda _1, \ldots , \lambda _p]E^T\) with \(E^{-1} = E^T\). The condition number of A is defined to be \(c(A) = \frac{\lambda _p}{\lambda _1}\). It follows that the inverse of A has the structure

    $$\begin{aligned} A^{-1} = Ediag[\lambda _1^{-1}, \lambda _2^{-1} ,\ldots , \lambda _p^{-1} ]E^T \end{aligned}$$
    (2.38)

    The situation where the spread of the eigenvalues of A is large (that is, c(A) is large) can be discussed by considering the case where \(\lambda _1\) is very small. In such situations, small errors in characterizing this eigenvalue can lead to large changes in the computed solution \(x=-A^{-1}b\).

  2. 2.

    These problems are exacerbated if the dimension p is large due to the number of floating point operations necessary in computer computation of \(A^{-1}\).

In the quadratic problems considered in this text, similar quadratic objective functions will be considered but the “matrices” involved are replaced by operators associated with dynamical systems models and, in intuitive terms, have very high (even infinite) dimensions and extremely large (possibly infinite) condition numbers. Formal solutions paralleling the algebraic constructions illustrated above hence have no immediate computational value, the core of the theoretical problem being that of developing control algorithms that use only feasible computational procedures that can be implemented using well-conditioned off-line algorithms and on-line feedback controllers.

2.2.2 Singular Values, Lagrangians and Matrix Norms

The singular values \( 0 \le \sigma _1\le \sigma _2 \le \cdots \le \sigma _q\) of a real (respectively complex) \(p \times q\) matrix A are real, positive numbers computed from the eigenvalues \(0 \le \lambda _1 \le \cdots \le \lambda _q\) of the symmetric (respectively Hermitian) matrix \(A^TA\) (respectively \(A^*A\)) by writing \(\lambda _j = \sigma _j^2,~ 1 \le j \le q\). The corresponding eigenvectors are often called singular vectors.

Associated with the matrix A is the notion of a matrix norm. As will be seen throughout this text, the idea of a norm is non-unique. What follows, therefore, is only an example that builds on the idea of singular values and illustrates the use of Lagrangian methods in optimization problems. The first step is the definition of a particular vector norm, the Euclidean norm, defined on vectors \(x \in \mathscr {R}^q\) (respectively \(\mathscr {C}^q\)) by \(\Vert x\Vert = \sqrt{x^Tx}\) (respectively \(\sqrt{x^*x}\)). The Euclidean norm induces a norm \(\Vert A\Vert \) on the matrix A by defining

$$\begin{aligned} \Vert A\Vert = \sup _{x \ne 0} \frac{\Vert Ax\Vert }{\Vert x\Vert } = \sup \{\Vert Ax\Vert : \Vert x\Vert =1\} \end{aligned}$$
(2.39)

In particular, it follows that, for all vectors \(x\in \mathscr {R}^q\) (respectively \(\mathscr {C}^q\)),

$$\begin{aligned} \Vert Ax\Vert \le \Vert A\Vert ~\Vert x\Vert . \end{aligned}$$
(2.40)

As a consequence, if A and B are \(p \times q\) and \(q \times r\) matrices, it follows that \(\Vert ABx\Vert \le \Vert A\Vert ~\Vert B\Vert ~ \Vert x\Vert \) and hence

$$\begin{aligned} \Vert AB\Vert \le \Vert A\Vert \Vert B\Vert . \end{aligned}$$
(2.41)

Suppose now that A is real and vectors are in \(\mathscr {R}^q\). From the above, the induced norm is the solution of an optimization problem with an equality constraint, namely

$$\begin{aligned} \Vert A\Vert ^2 = \sup \{\Vert Ax\Vert ^2 : \Vert x\Vert =1\} = \sup \{x^TA^TAx : x^Tx=1\}. \end{aligned}$$
(2.42)

The solution of this problem is computed by solving for the unique stationary point of the Lagrangian

$$\begin{aligned} \mathscr {L}[x,\lambda ]= x^TA^TAx + 2\lambda ^T\left( 1-x^Tx \right) \end{aligned}$$
(2.43)

where \(\lambda \) is the scalar Lagrange Multiplier for the single constraint \(1 - x^Tx=0\). The stationary point of the Lagrangian is the solution of the two equations

$$\begin{aligned} 1-x^Tx=0 \quad and \quad A^TAx = \lambda x, \end{aligned}$$
(2.44)

That is, \(\lambda \) is an eigenvalue of \(A^TA\) and the largest value of the optimization objective function \(\Vert Ax\Vert ^2=x^TA^TAx\) is simply the largest eigenvalue of \(A^TA\). Hence

$$\begin{aligned} \Vert A\Vert = \sigma _q. \end{aligned}$$
(2.45)

which provides a simple link between matrix norms and singular values. This relationship also holds for complex matrices operating on \(\mathscr {C}^q\). Finally,

  1. 1.

    If A is \(p \times p\) and nonsingular, then \(\Vert A\Vert = \sigma _p\) and \(\Vert A^{-1}\Vert = \sigma _1^{-1}\).

  2. 2.

    The smallest and largest singular values are denoted by \(\underline{\sigma }(A)= \sigma _1\) and \(\overline{\sigma }(A)= \sigma _p\) respectively.

  3. 3.

    The spectral radius is linked to matrix norms by the formula

    $$\begin{aligned} r(A) = \lim _{k \rightarrow \infty } \Vert A^k\Vert ^{1/k} \end{aligned}$$
    (2.46)

    from which, for all \(\varepsilon >0\), there exists a real number \(M_{\varepsilon } \ge 1\) such that

    $$\begin{aligned} \Vert A^k\Vert \le M_{\varepsilon } (r(A) + \varepsilon )^k \end{aligned}$$
    (2.47)

    If A can be diagonalized by a nonsingular eigenvector matrix E, then it is possible to choose \(\varepsilon = 0\) and \(M_0= \Vert E^{-1}\Vert \Vert E\Vert \).

2.3 Banach Spaces, Operators, Norms and Convergent Sequences

2.3.1 Vector Spaces

Matrices are just part of a more general approach to signal analysis based on vector spaces which are a mathematical generalization of the familiar three dimensional world that we live in. A real (respectively complex) vector space \(\mathscr {V}\) is a collection of objects (called vectors) with defined properties of vector addition and multiplication by real (respectively complex) scalars that satisfy the familiar relations

$$\begin{aligned} \begin{array}{rcl} v_1+v_2 &{} = &{} v_2 + v_1\\ v_1+(v_2+v_3) &{} = &{} (v_1+v_2)+v_3 \\ (\lambda _1 + \lambda _2)v &{} = &{} \lambda _1 v + \lambda _2 v \\ \lambda (v_1 + v_2) &{} = &{} \lambda v_1 + \lambda v_2 \end{array} \end{aligned}$$
(2.48)

for all \(v, v_1,v_2, v_3\) in \(\mathscr {V}\) and all scalars \(\lambda , \lambda _1, \lambda _2\). The zero vector in \(\mathscr {V}\) is denoted by the symbol 0. A vector subspace (or, more simply, a subspace) \(\mathscr {U} \subset \mathscr {V}\) is any subset of \(\mathscr {V}\) that satisfies the properties defined above.

It is easily seen that \(\mathscr {R}^p\) (respectively \(\mathscr {C}^p\)) is a real (respectively complex) vector space. Also the set of real (respectively complex) \(p \times q\) matrices is a real (respectively complex) vector space. Other examples and constructs of relevance to this text include

  1. 1.

    If \(\mathscr {V}\) is any real vector space, then its complexification is defined to be the complex vector space of all complex vectors \(v=v_1 + i v_2\) with both \(v_1\) and \(v_2\) elements of \(\mathscr {V}\). \(\mathscr {V}^c\) is sometimes written in the form

    $$\begin{aligned} \mathscr {V}^c = \mathscr {V} \oplus i\mathscr {V}. \end{aligned}$$
    (2.49)
  2. 2.

    The space of infinite sequences (or time series) \(\alpha = \{\alpha _0, \alpha _1, \alpha _2, \ldots \}\) with \(\alpha _ j \in \mathscr {R}^p\) (or \(\mathscr {C}^p\)) is a vector space with addition \(\gamma = \alpha + \beta \) and multiplication by scalars \(\gamma = \lambda \alpha \) defined by the equations, \(\gamma _j = \alpha _j + \beta _j\) and \(\gamma _j = \lambda \alpha _j\) for \(j = 0,1,2,3, \ldots \). A number of subspaces are of particular relevance here including \(\ell _{\infty }\) (the subspace of bounded sequences of scalars satisfying \(\sup _{j \ge 0}|\alpha _j| < + \infty \)) and \(\ell _2\) (the subspace of sequences of scalars satisfying \(\sum _{j = 0}^{\infty }|\alpha _j|^2 < + \infty \))

  3. 3.

    The space of real or complex valued continuous functions of a real variable t on an interval \(a \le t \le b\) (denoted \([a,b] \subset \mathscr {R}\)) is denoted by the symbol C[ab] and is a vector space with the usual definitions of addition and multiplication.

  4. 4.

    The space of all functions defined on [ab] and taking values in \(\mathscr {R}^p\) (respectively \(\mathscr {C}^p\)) is a real (respectively complex) vector space with the usual definitions of addition and multiplication. The real vector space \(L_2^p[a,b]\) is the set of all real \(p \times 1\) vector-valued functions f such that the Lebesgue integral

    $$\begin{aligned} \Vert f\Vert ^2 = \int _a^b~ \Vert f(t)\Vert ^2 dt \end{aligned}$$
    (2.50)

    is well defined and finite. If Q is any symmetric, positive definite \(p \times p\) matrix, then an equivalent statement is that

    $$\begin{aligned} \Vert f\Vert ^2_Q = \int _a^b~ f^T(t)Qf(t) dt \end{aligned}$$
    (2.51)

    is well defined and finite. If \(p=1\), then the space is written \(L_2[a,b]\).

The notion of linear independence of a set of vectors follows the example of matrix theory. More precisely, a set of vector \(\{v_j\}_{1 \le j \le M}\) is linearly independent if, and only if, \(\sum _{j=1}^M~ a_j v_j =0\) implies that all scalars \(a_j\) are zero. A basis for \(\mathscr {V}\) is a linearly independent set \(\{v_j\}_{1 \le j \le M}\) such that all elements \(v \in \mathscr {V}\) can be written as a unique linear combination of the \(\{v_j\}_{1 \le j \le M}\). If M is finite, then the space is said to be finite dimensional of dimension M . Otherwise it is infinite dimensional. The spaces \(\mathscr {R}^p\) and \(\mathscr {C}^p\) have dimension p whilst C[ab] and \(L_2[a,b]\) are infinite dimensional. For infinite dimensional spaces, the statement is more clearly stated by saying that the basis set \(\{v_j\}_{j \ge 0}\) has the property that all finite subsets of this set are linearly independent and the set of all finite linear combinations is dense in \(\mathscr {V}\). The concept of denseness is more fully described in Sect. 2.3.4.

Finally,

  1. 1.

    The space \(\mathscr {V}\) is said to be the sum of the vector subspaces \(\{\mathscr {V}_j\}_{ 1 \le j \le q}\) of \(\mathscr {V}\), written

    $$\begin{aligned} \mathscr {V} = \mathscr {V}_1 + \mathscr {V}_2 + \cdots + \mathscr {V}_q, \end{aligned}$$
    (2.52)

    if any vector \(v \in \mathscr {V}\) can be written as a linear combination \(v=\sum _{j=1}^q \alpha _j v_j\) with \(v_j \in \mathscr {V}_j,~ 1 \le j \le q\) and suitable choice of scalars \(\{\alpha _j\}_{1 \le j \le q}\). It is a direct sum decomposition written

    $$\begin{aligned} \mathscr {V} = \mathscr {V}_1 \oplus \mathscr {V}_2 \oplus \cdots \oplus \mathscr {V}_q \end{aligned}$$
    (2.53)

    if, and only if, each \(v \in \mathscr {V}\) can be written as a unique linear combination of elements of the subspaces.

  2. 2.

    A product vector space constructed from vector spaces \(\{\mathscr {V}_j\}_{ 1 \le j \le q}\) is denoted by the Cartesian Product notation

    $$\begin{aligned} \mathscr {V} = \mathscr {V}_1 \times \mathscr {V}_2 \times \cdots \times \mathscr {V}_q \end{aligned}$$
    (2.54)

    and consists of the set of p-tuples \(v=\{v_1, v_2, \ldots , v_q\}\) with \(v_j \in \mathscr {V}_j, 1 \le j \le q\) and the same laws of composition as those defined for (finite) times series. An example of this notation is the real product space \(L_2^p[a,b]\) defined by

    $$\begin{aligned} \begin{array}{rl} L_2^p[a,b] &{} = \underbrace{L_2[a,b] \times L_2[a,b] \times \cdots \times L_2[a,b]} \\ \quad &{} \quad \quad \quad \quad \quad \quad p-copies. \end{array} \end{aligned}$$
    (2.55)

    It is sometimes convenient to identify \(L^p_2[a,b]\) with the space of \(p \times 1\) vectors f with elements consisting of real valued functions \(f_j \in L_2[a,b],~ 1 \le j \le p\).

2.3.2 Normed Spaces

Measures of magnitude are important in applications of mathematics and are used extensively in this text as a means of algorithm design and analysis. For the vector space \(\mathscr {R}^p\), the familiar measure is the Euclidean length of the vector defined as \(\Vert v\Vert = \sqrt{v^Tv}\). This is just an example of the more general concept of a vector norm. More precisely, if \(\mathscr {V}\) is a finite or infinite dimensional, real or complex vector space, then a norm on \(\mathscr {V}\) is a mapping from vectors v into real numbers \(\Vert v\Vert \) with the properties that

$$\begin{aligned} \Vert v\Vert&\ge 0 \nonumber \\ \Vert v\Vert =0 \quad if,&\ and \ only~if,\quad v=0 \\ \Vert \lambda v\Vert&= |\lambda | \Vert v\Vert \nonumber \\ \Vert v_1 + v_2\Vert&\le \Vert v_1\Vert + \Vert v_2\Vert \nonumber \end{aligned}$$
(2.56)

for all vectors \(v, v_1,v_2\) in \(\mathscr {V}\) and scalars \(\lambda \). Note that the space \(\mathscr {V}\) possessing the norm is normally understood from the context but it is often useful to identify the space using a subscript such as \(\Vert v\Vert _{\mathscr {V}}\) or by other means.

An example of a norm in \(L_2^p[a,b]\) is

$$\begin{aligned} \Vert f\Vert = \left( \int _a^b~ e^{2\alpha t}f^T(t)Qf(t)dt \right) ^{1/2} \end{aligned}$$
(2.57)

where Q is a symmetric, real, positive-definite \(p \times p\) matrix and \(\alpha \) is any real scalar. Also, a norm in C[ab] can be defined by

$$\begin{aligned} \Vert f\Vert = \sup _{a \le t \le b} \left( e^{\alpha t}|f(t)| \right) . \end{aligned}$$
(2.58)

The \(L_2^p[a,b]\) norm is also a norm on C[ab].

For real (or complex) \( p \times q\) matrices, A, one choice of norm is the maximum singular value \(\overline{\sigma }(A)\) of A whilst, if A is real, another is the so-called Frobenius Norm defined by the trace formula

$$\begin{aligned} \Vert A\Vert = \sqrt{tr[A^TA]} \quad (the\, Frobenius\, Norm). \end{aligned}$$
(2.59)

or, more generally,

$$\begin{aligned} \Vert A\Vert = \sqrt{tr[WA^TQA]} \quad (the\, weighted\, Frobenius \,Norm). \end{aligned}$$
(2.60)

where Q and W are symmetric and positive definite matrices.

When endowed with a chosen norm \(\Vert \cdot \Vert \), the space \(\mathscr {V}\) is called a normed space. The choice of norm is non-unique and the same underlying vector space, when given different norms, generates a new normed space. For \(\mathscr {R}^p\), the following are norms for \(v=[v_1, v_2, \ldots ,v_p]^T\),

$$\begin{aligned} \Vert v\Vert = \sqrt{v^TQv} ~~(Q=Q^T >0), \quad \Vert v\Vert = \max |v_j|,\quad \Vert v\Vert = (\sum _{j=1}^p |v_j|^q)^{1/q}, q \ge 1. \end{aligned}$$
(2.61)

Two norms \(\Vert \cdot \Vert _1\) and \(\Vert \cdot \Vert _2\) on the same underlying vector space are said to be topologically equivalent if, and only if, there exists scalars \(0 < \beta _1 \le \beta _2\) such that, for all \(v \in \mathscr {V}\),

$$\begin{aligned} \beta _1 \Vert v\Vert _1 \le \Vert v\Vert _2 \le \beta _2\Vert v\Vert _1 \end{aligned}$$
(2.62)

All norms on a given finite dimensional space are topologically equivalent.

2.3.3 Convergence, Closure, Completeness and Banach Spaces

Given a normed space \(\mathscr {V}\), an infinite sequence \( \{v_j\}_{j \ge 0} = \{v_0, v_1, v_2, \ldots \}\) is said to converge in the norm topology (or, more simply, to converge) to a limit vector \(v\in \mathscr {V}\) if, and only if,

$$\begin{aligned} \lim _{j \rightarrow \infty } \Vert v-v_j\Vert _{\mathscr {V}}=0 \quad (written~ \lim _{j \rightarrow \infty } v_j=v). \end{aligned}$$
(2.63)

The nature of this convergence is defined by the norm used but it is easily seen that convergence with respect to one norm implies convergence with respect to any other topologically equivalent norm.

A subset \(S \subset \mathscr {V}\) is said to be an open subset if, for every point \(v \in \mathscr {V}\), the Open Ball \(B(v;\delta )\) defined by

$$\begin{aligned} B(v;\delta )= \{ w: w \in \mathscr {V}, \Vert v-w\Vert < \delta \} \end{aligned}$$
(2.64)

lies in S for some choice of \(\delta >0\). S is said to be closed if it contains the limit points of all convergent sequences with elements in S. The closure of a subset S (denoted \(\overline{S}\)) is the set consisting of points in S plus the limits of all convergent sequences in S. One consequence of this is that the Closed Ball \(B_c(v;\delta )\) defined by

$$\begin{aligned} B_c(v;\delta )= \{ w: w \in \mathscr {V}, \Vert v-w\Vert \le \delta \} \end{aligned}$$
(2.65)

is the closure of the open ball \(B(v;\delta )\). Subsets can be neither open nor closed. For real numbers, the symbols [ab], [ab), (ab] and (ab) are used to denote the intervals, respectively,

$$\begin{aligned} \begin{array}{l} \{t : a \le t \le b \} \quad (a\,closed\,interval), \\ \{t : a \le t < b \} \quad (a\,half\,open\,interval), \\ \{t : a < t \le b \} \quad (a \,half \,open\,interval), \\ \{ t: a < t < b \} \quad (an\,open\,interval), \end{array} \end{aligned}$$
(2.66)

A Cauchy Sequence \( \{v_j\}_{j \ge 0}\) in \(\mathscr {V}\) is a sequence with the property that, for all \(\varepsilon >0\), there exists an integer \(n_{\varepsilon }\) such that \(\Vert v_j - v_k\Vert < \varepsilon \) for all \(j \ge n_{\varepsilon }\) and \(k \ge n_{\varepsilon }\). That is, all points \(v_j\) in the sequence get “closer and closer together” as the index j increases. In general, not all Cauchy sequences converge. An example of this is the space C[ab] with the \(L_2[a,b]\) norm. It is a simple matter to construct a sequence of continuous functions that converge in norm to a discontinuous function which, by definition, is not in C[ab]. A normed space where all Cauchy sequences converge is said to be complete and it is said to be a Banach Space. For the purposes of this text, note that all normed spaces used are Banach Spaces unless otherwise stated including \(\mathscr {R}^p\), \(\mathscr {C}^p\), \(\ell _2\) and \(L_2^p[a,b]\) and their Cartesian products for any p and \(- \infty < a < b < +\infty \).

2.3.4 Linear Operators and Dense Subsets

All \(p \times q\) real matrices are examples of linear operators between the real spaces \(\mathscr {R}^q\) and \(\mathscr {R}^p\). The more general concepts of a linear operator/linear operators \(\varGamma : \mathscr {V} \mapsto \mathscr {W}\) mapping a vector space \(\mathscr {V}\) into another vector space \(\mathscr {W}\) follows similar lines by satisfying the linearity assumptions that, for all \(v, v_1, v_2\) in \(\mathscr {V}\) and scalars \(\lambda \),

$$\begin{aligned} \begin{array}{rcl} \varGamma (v_1+v_2) &{}=&{} \varGamma v_1 + \varGamma v_2 \\ \varGamma (\lambda v) &{}=&{} \lambda \varGamma v \end{array} \end{aligned}$$
(2.67)

For example, if \(\mathscr {V} = L_2^q[0,T]\) and \(\mathscr {W} = L_2^p[0,T]\), T is finite and H(t) is a \(p \times q\) matrix with elements that are continuous in t, then the mapping \(v \mapsto \varGamma v\) defined by the Convolution Integral

$$\begin{aligned} \left( \varGamma v \right) (t) = \int _0^t~ H(t-t')v(t')dt', \quad 0 \le t \le T, \end{aligned}$$
(2.68)

is a well-defined linear operator. The identity or unit operator is the linear operator \(I: \mathscr {V} \mapsto \mathscr {V}\) defined by \(Iv=v\) for all \(v \in \mathscr {V}\). If both \(\mathscr {V}\) and \(\mathscr {W}\) are real vector spaces then a linear operator \(\varGamma : \mathscr {V} \mapsto \mathscr {W}\) can be extended to a linear operator (again denoted by \(\varGamma \)) mapping the complexification \(\mathscr {V}^c\) into the complexification \(\mathscr {W}^c\) by the relation \(\varGamma (u+iv) = \varGamma u + i \varGamma v\). Also two operators \(\varGamma _1: \mathscr {V} \mapsto \mathscr {V}\) and \(\varGamma _2: \mathscr {V} \mapsto \mathscr {V}\) are said to commute if

$$\begin{aligned} \varGamma _1 \varGamma _2 = \varGamma _2 \varGamma _1. \end{aligned}$$
(2.69)

Linear operators can be associated with norms quite easily. More precisely, with the notation used above, suppose that both \(\mathscr {V}\) and \(\mathscr {W}\) are normed spaces, then the operator norm of \(\varGamma \) (induced by the norms in \(\mathscr {V}\) and \(\mathscr {W}\)) is defined to be

$$\begin{aligned} \Vert A\Vert = \sup _{v \ne 0} \frac{\Vert \varGamma v\Vert _{\mathscr {W}}}{\Vert v\Vert _{\mathscr {V}}} = \sup _{\Vert v\Vert =1} \frac{\Vert \varGamma v\Vert _{\mathscr {W}}}{\Vert v\Vert _{\mathscr {V}}} \end{aligned}$$
(2.70)

If the norm is finite, the operator is said to be bounded . In all other cases, it is unbounded. The identity operator is bounded with induced norm \(\Vert I\Vert =1\).

The definition of the operator norm implies that, for all \( v \in \mathscr {V}\),

$$\begin{aligned} \Vert \varGamma v \Vert _{\mathscr {W}} \le \Vert \varGamma \Vert \Vert v\Vert _{\mathscr {V}} \end{aligned}$$
(2.71)

which implies the fact that boundedness of an operator is equivalent to its continuity . In addition, it is easily shown that, if \(\varGamma _2: \mathscr {V} \mapsto \mathscr {W}\) and \(\varGamma _1: \mathscr {W} \mapsto \mathscr {Z}\) are two bounded linear operators between normed spaces, then the composite operator \(\varGamma _1 \varGamma _2: \mathscr {V} \mapsto \mathscr {Z}\) defined by \((\varGamma _1 \varGamma _2)v = \varGamma _1 ( \varGamma _2 v)\) has a norm bound

$$\begin{aligned} \Vert \varGamma _1 \varGamma _2 \Vert \le \Vert \varGamma _1 \Vert \Vert \varGamma _2\Vert . \end{aligned}$$
(2.72)

In a similar manner, the operator sum \(\varGamma _1+\varGamma _2\) defined by \((\varGamma _1+\varGamma _2)v = \varGamma _1v+\varGamma _2v\) and multiplication by scalars \((\lambda \varGamma )v = \lambda (\varGamma v)\) rules make the set \(\mathscr {L}(\mathscr {V};\mathscr {W})\) of bounded linear operators from \(\mathscr {V}\) into \(\mathscr {W}\) into a normed vector space in its own right. If \(\mathscr {V}\) and \(\mathscr {W}\) are Banach spaces, then so is \(\mathscr {L}(\mathscr {V};\mathscr {W})\).

The kernel and range of an operator \(\varGamma : \mathscr {V} \rightarrow \mathscr {W}\) play a vital role in analysis and are defined (as for matrices) using

$$\begin{aligned} \begin{array}{rcl} ker[\varGamma ] &{}=&{} \{ v \in \mathscr {V}: \varGamma v = 0 \} \\ \mathscr {R}[\varGamma ] &{}=&{} \{w \in \mathscr {W}: w=\varGamma v \, for\, some \, v \in \mathscr {V} \} \end{array} \end{aligned}$$
(2.73)

The operator is injective (or one-to-one) if its kernel is the single point \(\{0\}\) and it is surjective (or onto) if its range is \(\mathscr {W}\). It is bijective if it is both injective and surjective. If \(\varGamma \) is bijective and \(\mathscr {W}\) is finite dimensional, then it has a bounded, linear inverse \(\varGamma ^{-1}: \mathscr {W} \mapsto \mathscr {V}\) defined by the relation \(\varGamma ^{-1}(\varGamma v) = v\) for all \(v \in \mathscr {V}\) or, more simply \(\varGamma ^{-1}\varGamma =I \) where I is the identity in \(\mathscr {V}\). If \(\mathscr {V}=\mathscr {W}\), then, in addition, \(\varGamma \varGamma ^{-1}=I \).

The notion of inverse familiar in matrix theory also has relevance to the interpretation of the idea of inverse operators but with more technical complexity if \(\mathscr {V}\) and \(\mathscr {W}\) are infinite dimensional. One concept that is central to the discussion is that of a dense subset \(S_1\) of a subset S of a normed space \(\mathscr {V}\). More precisely, \(S_1\) is dense in S if, and only if, for every point \(v \in S\) and for every \(\varepsilon > 0\), there exists a point \(v_{\varepsilon } \in S_1\) such that \(\Vert v-v_{\varepsilon }\Vert < \varepsilon \). In effect, every point in S has a point in \(S_1\) arbitrarily close to it. Three observations are related to this

  1. 1.

    If \(\varGamma \) is bounded, then its kernel is closed.

  2. 2.

    \(\varGamma \) being bounded does not necessarily imply that its range is closed.

  3. 3.

    If \(\mathscr {W}\) is finite dimensional and \(\varGamma \) is bounded, \(\varGamma \) has a closed range.

The second observation can be illustrated by the case of a convolution operator (2.68) mapping \(L_2^q[0,T]\) into \(L_2^p[0,T]\). The range of \(\varGamma \) is, at best, the set of continuous \( p \times 1\) matrix valued functions which is known to be dense in \(L_2^p[0,T]\).

The important point that follows from the above is that the range of the operator in infinite dimensional spaces has more complex properties than those observed for matrices. The consequences of this fact are many in number and include the possibility that the inverse of a bounded operator may exist but be unbounded. If \(\varGamma \) has a bounded (and hence continuous) inverse, it is said to be a Homeomorphism . In such cases it is easily seen that \(1 \le \Vert \varGamma \Vert \Vert \varGamma ^{-1}\Vert \).

For matrices, the eigenvalues of a matrix A are defined to be scalars \(\lambda \) which ensure that \(\lambda I -A\) has no inverse. The idea of eigenvalues requires careful generalization to the case of linear operators. More precisely, suppose that \(\mathscr {V}\) is a complex Banach space, then the Resolvent Set of a linear operator \(\varGamma : \mathscr {V} \mapsto \mathscr {V}\) is defined to be the set of complex numbers \(\lambda \) where \(\lambda I-\varGamma \) is bijective. As a consequence of the Open Mapping Theorem , for any such \(\lambda \), the Resolvent Operator \((\lambda I - \varGamma )^{-1}\) is bounded. Using this construction, the spectrum (denoted by \(spec[\varGamma ]\)) of \(\varGamma \) is defined to be the complement of the Resolvent Set and hence is the set of complex numbers \(\lambda \) where \(\lambda I-\varGamma \) does not have a bounded inverse. This definition includes the eigenvalues of \(\varGamma \) (the so-called Point Spectrum defined by the existence of non-zero eigenvectors/eigenfunctions \(v \in \mathscr {V}\) such that \(\varGamma v = \lambda v\)) but also other points in what are termed the Continuous and Residual Spectrum. In finite dimensional spaces, the residual and continuous spectra are empty and the results of matrix algebra describe the spectrum completely in terms of eigenstructure. Finally,

  1. 1.

    the spectral radius of a bounded operator \(\varGamma : \mathscr {V} \mapsto \mathscr {V}\) is defined by

    $$\begin{aligned} r(\varGamma ) = \sup \{|\lambda | : \lambda \in spec[\varGamma ] \}, \end{aligned}$$
    (2.74)

    a definition that reduces to that for matrices if \(\mathscr {V} = \mathscr {C}^p\). In particular, if \(\mathscr {V}\) is a Banach space, then

    $$\begin{aligned} r(\varGamma ) = \lim _{k \rightarrow \infty } \Vert \varGamma ^k\Vert ^{1/k}~ \end{aligned}$$
    (2.75)

    which relates the spectral radius to powers of \(\varGamma \). As \(\Vert \varGamma ^k\Vert \le \Vert \varGamma \Vert ^k\) for all \(k \ge 1\), this implies that

    $$\begin{aligned} r(\varGamma ) \le \Vert \varGamma \Vert \end{aligned}$$
    (2.76)

    and, for all \(\varepsilon >0\), there exists a real number \(M_{\varepsilon } \ge 1\) such that

    $$\begin{aligned} \Vert \varGamma ^k\Vert \le M_{\varepsilon } (r(\varGamma ) + \varepsilon )^k. \end{aligned}$$
    (2.77)
  2. 2.

    The ideas of functions of operators and the Spectral Mapping Theorem , easily proven for matrices, can be extended to bounded operators from Banach spaces into Banach spaces. More precisely, if f(z) has a power series expansion \(\sum _{j=0}^{\infty }f_j z^j\) with radius of convergence R, then \(f(\varGamma )\) is defined to be the operator \(\sum _{j=0}^{\infty }f_j \varGamma ^j\) which is convergent if \(r(\varGamma ) < R\). The spectrum of \(f(\varGamma )\) is simply \(\{z: z=f(\eta ),~ \eta \in spec[\varGamma ]\}\) or, more compactly,

    $$\begin{aligned} spec[f(\varGamma )] = f(spec[\varGamma ]). \end{aligned}$$
    (2.78)

    For example, the Resolvent \((\lambda I - \varGamma )^{-1}\) has the power series representation

    $$\begin{aligned} (\lambda I - \varGamma )^{-1} = \sum _{j=0}^{\infty } \lambda ^{-(j+1)}\varGamma ^j \end{aligned}$$
    (2.79)

    which is convergent if \(\varGamma \) has spectral radius strictly less than \(|\lambda |\). A sufficient condition for this is that \(\Vert \varGamma \Vert < |\lambda |\). The spectrum of the Resolvent is \(\{z: z=(\lambda - \eta )^{-1}, ~ \eta \in spec[\varGamma ] \}\).

The following result relates the spectral radius to the iterative learning control studies in this text. More precisely, the spectral radius describes convergence in norm of a simple, but typical, iteration formula .

Theorem 2.1

(Convergence in Norm and the Spectral Radius) Let \(\mathscr {V}\) be a Banach space. Then, given an arbitrary starting vector \(v_0 \in \mathscr {V}\) and a bounded linear operator \(\varGamma : \mathscr {V} \mapsto \mathscr {V}\), the sequence \(\{v_j\}_{j \ge 0}\) generated by the iteration \(v_{j+1} = \varGamma v_j, j \ge 0,\) converges (in norm) to zero if (a sufficient condition)  

$$\begin{aligned} r(\varGamma ) < 1. \end{aligned}$$
(2.80)

A sufficient condition for this to be true is that \(\Vert \varGamma \Vert <1\).

Proof

Note, using induction, \(v_j = \varGamma ^j v_0\) for all \(j \ge 0\). Using the notation above, the assumptions make possible the selection of \(\varepsilon >0\) such that \(r(\varGamma ) + \varepsilon <1\). It follows that, as required,

$$\begin{aligned} \Vert v_j\Vert = \Vert \varGamma ^jv_0\Vert \le \Vert \varGamma ^j\Vert \Vert v_0\Vert \le M_{\varepsilon }(r(\varGamma ) + \varepsilon )^j \rightarrow 0 \quad as\ j \rightarrow \infty . \end{aligned}$$
(2.81)

The theorem is proved as the norm condition is sufficient to ensure the required condition on the spectral radius. \(\square \)

Note the conceptual similarity of this results to the familiar results from discrete time, sampled data systems control where asymptotic stability of \(x_{j+1} = A x_j\) is equivalent to the condition \(r(A) < 1\) and hence is equivalent to the poles of the systems transfer function being inside the unit circle of the complex plane.

2.4 Hilbert Spaces

2.4.1 Inner Products and Norms

Although Banach spaces play a role is some areas of Control Theory and Optimization, the addition of geometrical structures plays an important role in algorithm design. The relevant structure is that of a Hilbert Space. More precisely, let \(\mathscr {V}\) be a real (respectively complex) Banach space endowed with an associated inner product \(\langle \cdot , \cdot \rangle : \mathscr {V} \times \mathscr {V} \mapsto \mathscr {R}\) (respectively \(\mathscr {C}\)) possessing the properties that, for all uvw in \(\mathscr {V}\) and real (respectively complex) scalars \(\lambda \),

$$\begin{aligned} \begin{array}{rcl} \langle u,v \rangle &{}=&{} \langle v,u \rangle ~~(respectively~\langle u,v \rangle = \overline{\langle v,u \rangle }), \\ \langle u,v+w \rangle &{}=&{} \langle u,v \rangle + \langle u,w \rangle , \\ \langle u,\lambda v \rangle &{}=&{} \lambda \langle u,v \rangle , \\ \langle v,v \rangle &{} \ge &{} 0 ~~and \\ \langle v,v \rangle &{} = &{} 0 \quad if, \, and \, only \, if,\,\, v=0. \end{array} \end{aligned}$$
(2.82)

Suppose also that the norm in \(\mathscr {V}\) can be computed from the inner product using the formula

$$\begin{aligned} \Vert v\Vert = \sqrt{\langle v,v \rangle }, \end{aligned}$$
(2.83)

then \(\mathscr {V}\) is said to be a real (respectively complex) Hilbert Space. Note that, if it is necessary to identify the space or some other aspect of the formulae arising in the theory, the norm and inner product may be given subscripts as an aide memoire to the reader. For example, to identify the space being considered, both \(\Vert v \Vert \) and \(\langle u,v \rangle \) can be written in the form \(\Vert v \Vert _{\mathscr {V}}\) and \(\langle u,v \rangle _{\mathscr {V}}\).

Examples of Hilbert spaces include:

  1. 1.

    The space \(\mathscr {R}^p\) is a Hilbert space with inner product \(\langle u,v \rangle = u^TQv\) where Q is any symmetric, positive definite \(p \times p\) matrix.

  2. 2.

    If \(- \infty < a < b < +\infty \), then \(L_2^p[a,b]\) is a Hilbert space with inner product

    $$\begin{aligned} \langle u,v \rangle = \int _a^b~ u^T(t)Q(t)v(t)dt \end{aligned}$$
    (2.84)

    where Q(t) is any piecewise continuous \(p \times p\) matrix satisfying an inequality of the form

    $$\begin{aligned} \alpha _1 I_p \le Q(t) \le \alpha _2 I_p, \quad for~all~t \in [a,b] \end{aligned}$$
    (2.85)

    and some scalars \(0 < \alpha _1 \le \alpha _2\). For example, if \(\alpha \ge 0\) and \(Q(t) = e^{2\alpha t}Q\) with Q a constant, symmetric, positive definite matrix with eigenvalues \(0 < q_1 \le q_2 \le \cdots \le q_p\), then the conditions are satisfied with \(\alpha _1 = q_1e^{\alpha a}\) and \(\alpha _2 = q_p e^{\alpha b}\).

Finally, the inner product has a number of useful additional properties including the Cauchy-Schwarz Inequality which takes the form, for any uv in \(\mathscr {V}\)

$$\begin{aligned} |\langle u,v \rangle | \le \Vert u\Vert \Vert v\Vert \end{aligned}$$
(2.86)

with equality holding if, and only if, v is a multiple of u. That is, if and only if, \(v=\lambda u\) for some scalar \(\lambda \).

Also the inner product allows the introduction of ideas of orthogonality . More precisely, two vectors uv in \(\mathscr {V}\) are said to be orthogonal if, and only if, \(\langle u, v \rangle =0\), a definition that is consistent with that used for Euclidean geometry in \(\mathscr {R}^p\). The orthogonal complement of a vector subspace S of \(\mathscr {V}\) is denoted \(S^{\perp }\) where

$$\begin{aligned} S^{\perp } = \{v \in \mathscr {V}: \langle v, u \rangle =0\quad for \,all \, u \in S \} \end{aligned}$$
(2.87)

\(S^{\perp }\) is a closed subspace. If, in addition, S is a closed subspace, then \(\mathscr {V}\) has the direct sum decomposition

$$\begin{aligned} \mathscr {V} = S \oplus S^{\perp } = \{w=u+v: u \in S, \quad v \in S^{\perp }\}. \end{aligned}$$
(2.88)

If \(\{v_j\}_{j \ge 1}\) is a basis for \(\mathscr {V}\) and \(\langle v_j, v_k \rangle =0 ~\) whenever \( j \ne k \), then the basis set is an orthogonal basis . If, by suitable scaling (notably replacing each \(v_j\) by the normalized vector \(v_j / \Vert v_j\Vert \)), the basis set is said to be an orthonormal basis with the defining property that, for all \(j \ge 1\) and \(k \ge 1\),

$$\begin{aligned} \langle v_j, v_k \rangle = \delta _{jk} \quad where \end{aligned}$$
(2.89)

the symbol \(\delta _{jk}\) is the Kronecker Delta defined by \(\delta _{jk}=0\) if \(j \ne k\) and is unity otherwise. Under these conditions, any vector \(v \in \mathscr {V}\) has the form

$$\begin{aligned} v=\sum _{j=1}^{\infty }~\alpha _j v_j \quad with \quad \alpha _j = \langle v_j, v \rangle , \, for \,j \ge 1, and~\Vert v\Vert ^2=\sum _{j=1}^{\infty }~|\alpha _j|^2 < \infty . \end{aligned}$$
(2.90)

Finally, the ideas of inner products can be applied to some normed spaces that are not complete. Examples include the space C[ab] endowed with the inner product (and induced norm) used for the Hilbert space \(L_2[a,b]\) or, more generally, a dense, but not complete, subspace of a Hilbert space. Such spaces are said to be Pre-Hilbert Spaces . The geometry of such spaces is identical to that of Hilbert spaces but results that rely on the convergence of Cauchy sequences (and hence the existence of limits) need to be carefully considered.

2.4.2 Norm and Weak Convergence

The convergence of sequences Hilbert spaces is defined as in any normed space but it is often called convergence in norm, convergence in the norm topology or, more simply, norm convergence. This is because, in Hilbert spaces, another useful definition of convergence is that of Weak Convergence. More precisely, a sequence \(\{v_j\}_{j \ge 0}\) in a Hilbert space \(\mathscr {V}\) is said to converge weakly to a vector \(v_{\infty } \in \mathscr {V}\) if, and only if,

$$\begin{aligned} \lim _{j \rightarrow \infty } \langle f,v_{\infty } - v_j \rangle = 0 \quad for \, all \, f \in \mathscr {V}. \end{aligned}$$
(2.91)

The Cauchy-Schwarz inequality immediately indicates that convergence in norm to \(v_{\infty }\) implies weak convergence to that vector. However, weak convergence of a sequence does not imply, necessarily, its convergence in norm.

The limit need only be valid on a dense subset of \(\mathscr {V}\) as,

Theorem 2.2

Using the notation above, suppose that the sequence \(\{v_j\}_{j \ge 0}\) is bounded in the sense that there exists a real scalar M such that \(\Vert v_j\Vert \le M\) for all \(j \ge 0\). Suppose also that, for some dense subset \(S \subset \mathscr {V}\)

$$\begin{aligned} \lim _{j \rightarrow \infty } ~\langle f,v_{\infty } - v_j \rangle = 0 \quad for \, all \, f \in S. \end{aligned}$$
(2.92)

Then, \(\{v_j\}_{j \ge 0}\) converges weakly to \(v_{\infty }\).

Proof

First note that \(\Vert v_{\infty } - v_j\Vert \le \Vert v_{\infty }\Vert + \Vert v_j\Vert \le \Vert v_{\infty }\Vert + M\). Next write, for any \(f_{\varepsilon } \in \mathscr {V}\),

$$\begin{aligned} \langle f,v_{\infty } - v_j \rangle = \langle f - f_{\varepsilon },v_{\infty } - v_j \rangle + \langle f_{\varepsilon },v_{\infty }- v_j \rangle . \end{aligned}$$
(2.93)

Let \(\varepsilon >0\) be arbitrary and choose \(f_{\varepsilon } \in S\) so that \(\Vert f - f_{\varepsilon }\Vert < \varepsilon \), then the inequality

$$\begin{aligned} |\langle f,v_{\infty } - v_j \rangle | \le \Vert (f - f_{\varepsilon }\Vert \Vert v_{\infty } - v_j \Vert + |\langle f_{\varepsilon },v_{\infty }- v_j \rangle | \end{aligned}$$
(2.94)

indicates that

$$\begin{aligned} \limsup _{j \rightarrow \infty }|\langle f,v_{\infty } - v_j \rangle | \le \varepsilon (\Vert v_{\infty }\Vert +M ) \end{aligned}$$
(2.95)

which proves the result as the left hand side is independent of \(\varepsilon > 0 \) which is arbitrary. \(\square \)

Another useful property of weak convergence is that of guaranteed convergence of subsequences. This is stated as follows and is usually associated with the notion of weak compactness of the closed unit ball B(0; 1) in \(\mathscr {V}\).

Theorem 2.3

(Ascoli’s Theorem) If \(\mathscr {V}\) is a Hilbert space and \(S=\{v_j\}_{j \ge 0}\) is an infinite but bounded sequence of vectors, then S has a subsequence \(S_1 = \{v_j\}_{k_j \ge 0}\) that converges weakly to some vector \(v_{\infty }\) in \(\mathscr {V}\).

The result states that all bounded sequences in any Hilbert space contain weakly convergent subsequences. If the convergent subsequence is removed from the original sequence, it will leave either a finite set (a situation which implies that the sequence itself converges weakly to \(v_{\infty }\)) or an infinite sequence. In the second case, the remaining sequence is also bounded and hence (by Ascoli’s Theorem) it, too, has a subsequence converging weakly to some (possibly different) weak limit \({\hat{v}}_{\infty } \in \mathscr {V}\). It is concluded that there is a possibility that S has many subsequences with different weak limits.

The essential property needed for weak convergence is boundedness of the sequence. The following result provides some insight into possibilities.

Theorem 2.4

(Weak Convergence and Operator Norms) Any iteration \(v_{j+1}=\varGamma v_j, ~ j \ge 0\), in a real Hilbert space \(\mathscr {V}\) where \(\varGamma : \mathscr {V} \rightarrow \mathscr {V}\) is a bounded linear operator with norm \(\Vert \varGamma \Vert \le 1\) generates a sequence \(\{v_j\}_{k_j \ge 0}\) that is bounded in norm and has weakly convergent subsequences. If \(\Vert \varGamma \Vert <1\), then the sequence converges in norm to zero.

Proof

As \(v_j = \varGamma ^j v_0\), it follows that \(\Vert v_j \Vert \le \Vert \varGamma \Vert ^j\Vert v_0\Vert \) which proves the result as the sequence is always bounded by \(\Vert v_j\Vert \le \Vert v_0\Vert \). Ascoli’s Theorem then indicates the existence of a weak limit of some subsequence. Convergence in norm if \(\Vert \varGamma \Vert <1\) follows from the definitions. \(\square \)

Iterations of the form \(v_{j+1}=\varGamma v_j\) appear regularly in this text. Theorems 2.1 and 2.4 above provide two conditions for some form of convergence. It is worth noting that the values of the spectral radius or norm of the operator \(\varGamma \) are central to the stated results. As \(r(\varGamma ) \le \Vert \varGamma \Vert \), the use of the spectral radius will produce the best prediction of some form of convergence. This is particularly apparent in the case when \(r(\varGamma ) < 1 < \Vert \varGamma \Vert \) when the norm cannot be used to prove convergence but the use of the spectral radius indicates convergence in norm to zero. The more difficult, but important, case that also plays a role in iterative control is the case when \(r(\varGamma ) = \Vert \varGamma \Vert = 1\) when weak convergence of subsequences is guaranteed by Theorem 2.4 but convergence in norm is not covered by either result.

2.4.3 Adjoint and Self-adjoint Operators in Hilbert Space

Suppose that \(\varGamma : \mathscr {V} \mapsto \mathscr {W}\) is a bounded linear operator mapping a real or complex Hilbert space \(\mathscr {V}\) into a real or complex Hilbert space \(\mathscr {W}\). The Adjoint Operator \(\varGamma ^*: \mathscr {W} \mapsto \mathscr {V}\) is the uniquely defined bounded linear operator mapping \(\mathscr {W}\) into \(\mathscr {V}\) and satisfying the identity, for all \(u \in \mathscr {W}\) and \(v \in \mathscr {V}\),

$$\begin{aligned} \langle u,\varGamma v \rangle _{\mathscr {W}} = \langle \varGamma ^* u, v \rangle _{\mathscr {V}} \end{aligned}$$
(2.96)

There are many general links between an operator and its adjoint. These include the additive, multiplicative and inversion rules

$$\begin{aligned} (\varGamma _1 + \varGamma _2)^*&= \varGamma _1^* + \varGamma _2^*, \quad \nonumber \\ (\varGamma _1 \varGamma _2)^* = \varGamma _2^*\varGamma _1^*&and, if \, \mathscr {V}=\mathscr {W}, \quad (A^{-1})^* = (A^*)^{-1} \end{aligned}$$
(2.97)

(when the inverse exists). The cases of real and complex Hilbert spaces need a little care as, for \(\varGamma : \mathscr {V} \mapsto \mathscr {W}\) and any scalar \(\lambda \), the adjoint \((\lambda \varGamma )^* = \lambda \varGamma ^*\) if \(\mathscr {V}\) is a real Hilbert space but equal to \(\overline{\lambda } \varGamma ^*\) if \(\mathscr {V}\) is a complex Hilbert space. Also

$$\begin{aligned} (\varGamma ^*)^* = \varGamma . \end{aligned}$$
(2.98)

A result that plays a role in the following text expresses the adjoint of a map into a product space in terms of adjoints of operators associated with each component.

Theorem 2.5

(The Adjoint of a Map into a Product Hilbert Space)  Let \(\mathscr {V}, \mathscr {W}_1, \ldots , \mathscr {W}_p\) be real Hilbert spaces and define the product Hilbert space \(\mathscr {W}_1 \times \cdots \times \mathscr {W}_p \) to be the product space with inner product and induced norm defined by

$$\begin{aligned} \begin{array}{rcl} \langle (w_1, \ldots , w_p),(z_1, \ldots , z_p) \rangle _{\mathscr {W}_1 \times \cdots \times \mathscr {W}_p } &{} = &{} \sum _{j=1}^p~ \langle w_j, z_j \rangle _{\mathscr {W}_j} \\ and \quad \quad \Vert (w_1, \ldots , w_p)\Vert ^2_{\mathscr {W}_1 \times \cdots \times \mathscr {W}_p } &{} = &{} \sum _{j=1}^p~ \Vert w_j\Vert ^2_{\mathscr {W}_j}. \end{array} \end{aligned}$$
(2.99)

Let the operator \(G: \mathscr {V} \rightarrow \mathscr {W}_1 \times \cdots \times \mathscr {W}_p\) be linear and bounded. Then G can be represented by the mapping, for all \(v \in \mathscr {V}\),

$$\begin{aligned} Gv = (G_1v, G_2v, \ldots ,G_pv) \end{aligned}$$
(2.100)

where \(G_j: \mathscr {V} \rightarrow \mathscr {W}_j\) is linear and bounded. The adjoint map \(G^* : \mathscr {W}_1 \times \cdots \times \mathscr {W}_p \rightarrow \mathscr {V}\) is the bounded linear operator defined by the relation,

$$\begin{aligned} G^*(w_1, w_2, \ldots , w_p) = G_1^*w_1 + G_2^*w_2 + \cdots + G_p^*w_p, \end{aligned}$$
(2.101)

where, for \(1 \le j \le p\), \(G^*_j: \mathscr {W}_j \rightarrow \mathscr {V}\) is the adjoint of \(G_j\).

Proof

The characterization of G in terms of the \(G_j\) follows easily from the linearity of G. The adjoint of G is identified from the equation

$$\begin{aligned} \begin{array}{rcl} \langle (w_1, \ldots , w_p),Gv \rangle _{\mathscr {W}_1 \times \cdots \times \mathscr {W}_p } &{} =&{} \sum _{j=1}^p~ \langle w_j, G_jv \rangle _{\mathscr {W}_j} \\ =\sum _{j=1}^p~ \langle G^*_jw_j, v \rangle _{\mathscr {V}} &{} = &{} \langle ~\sum _{j=1}^p~G^*_jw_j, v \rangle _{\mathscr {V}}. \\ \end{array} \end{aligned}$$
(2.102)

The theorem is proved by comparing this with \(\langle G^*(w_1, w_2, \ldots ,w_p), v \rangle _{\mathscr {V}}\). \(\square \)

An operator \(\varGamma : \mathscr {V} \mapsto \mathscr {V}\) is self adjoint if, and only if, \(\varGamma = \varGamma ^*\). If \(\varGamma \) is self adjoint then \(\langle u, \varGamma u \rangle _{\mathscr {V}}\) is always real. \(\varGamma \) is then said to be positive if \(\langle u, \varGamma u \rangle \ge 0\) for all \(u \in \mathscr {V}\), positive definite if it is positive and \(\langle u, \varGamma u \rangle = 0\) if, and only if, \(u=0\) and positive semi-definite if it is positive but there exits a non-zero u such that \(\langle u, \varGamma u \rangle = 0\). Positive commuting operators have special properties as follows:

Theorem 2.6

If \(~\varGamma _1,\varGamma _2 \) and \(\varGamma _3\) are linear, bounded, self-adjoint, positive, commuting operators mapping a Hilbert space \(\mathscr {V}\) into itself, then

$$ \begin{aligned} \begin{array}{rcl} \varGamma _1\ge 0~ \& ~\varGamma _2 \ge 0 &{}\Rightarrow &{}\varGamma _1 \varGamma _2 \ge 0\\ \varGamma _1 \ge \varGamma _2 &{}\Rightarrow &{}\varGamma _1 \varGamma _3 \ge \varGamma _2 \varGamma _3 \end{array} \end{aligned}$$
(2.103)

The form of the adjoint operator depends on the spaces used and, in particular, on the form of inner product used. For example, matrix algebra proves that,

Theorem 2.7

(Adjoint of a Matrix Operator) let \(\mathscr {V}\) be \(\mathscr {R}^p\) with inner product \(\langle {\hat{v}}, v \rangle _{\mathscr {V}} = {\hat{v}}^TRv\) (where \(R=R^T >0\)) and \(\mathscr {W}\) be \(\mathscr {R}^q\) with inner product \(\langle {\hat{w}}, w \rangle _{\mathscr {W}} = {\hat{w}}^TQw\) (where \(Q=Q^T >0\)). \(\varGamma \) is a \(p \times q\) real matrix with adjoint \(\varGamma ^*\) satisfying, for all u and w,

$$\begin{aligned} w^TQ\varGamma v = (\varGamma ^* w)^TRv \quad and \, hence \quad \varGamma ^* = R^{-1}\varGamma ^T Q. \end{aligned}$$
(2.104)

In particular, when \(R=I_p\) and \(Q=I_q\), the adjoint is simply the transpose of the matrix \(\varGamma \).

Note: it is conventional to use the \(^*\) notation to denote the adjoint operator but it is also often used to denote the complex conjugate transpose of a matrix. There is a possibility of confusion from time to time but careful attention to the context of the analysis should easily resolve any ambiguity.

The operator \(\varGamma ^* \varGamma \) is self adjoint. It is also positive as \(\langle u, \varGamma ^*\varGamma u \rangle = \langle \varGamma u, \varGamma u \rangle = \Vert \varGamma u\Vert ^2 \ge 0\). As a consequence,

Theorem 2.8

(Invertibility and Positivity of Operators) With the above notation,

  1. 1.

    The operator \(\varGamma ^*\varGamma : \mathscr {V} \rightarrow \mathscr {V}\) is positive definite if, and only if, \(ker[\varGamma ] = \{0\}\).

  2. 2.

    The operator \(\varGamma : \mathscr {V} \rightarrow \mathscr {W}\) has a bounded inverse if, and only if, there exists a constant \(\alpha >0\) such that \(\varGamma ^*\varGamma \ge \alpha I_{\mathscr {V}}\) and \(\varGamma \varGamma ^* \ge \alpha I_{\mathscr {W}}\).

These properties link invertibility to positivity as follows,

Theorem 2.9

(Invertibility and Positivity of Operators) An operator \(\varGamma : \mathscr {V} \rightarrow \mathscr {V}\) has a bounded inverse on the Hilbert space \(\mathscr {V}\) if there exists a real number \(\varepsilon _0 > 0\) such that

$$\begin{aligned} \varGamma + \varGamma ^* \ge \varepsilon _0^2I. \end{aligned}$$
(2.105)

Proof

Noting that, for any real scalar \(\lambda \),

$$\begin{aligned} \begin{array}{rcl} 0 &{} \le &{} (I-\lambda \varGamma ^*)(I-\lambda \varGamma ) = I - \lambda (\varGamma + \varGamma ^*) + \lambda ^2 \varGamma ^* \varGamma \\ &{} \le &{} (1 - \lambda \varepsilon _0^2)I + \lambda ^2 \varGamma ^* \varGamma . \end{array} \end{aligned}$$
(2.106)

Exactly the same relation ship for \(\varGamma \varGamma ^*\) is obtained using \((I-\lambda \varGamma )(I-\lambda \varGamma ^*)\) so that the positivity condition of the previous result holds by choosing \(\lambda \) so that \(\alpha = \lambda ^{-2}(\lambda \varepsilon _0^2 - 1) > 0\). \(\square \)

The operator norms \(\Vert \varGamma \Vert \) and \(\Vert \varGamma ^*\Vert \) are related by the expression,

$$\begin{aligned} \Vert \varGamma \Vert = \Vert \varGamma ^*\Vert \end{aligned}$$
(2.107)

and the range and kernels of the operators satisfy the orthogonality relations

$$\begin{aligned} \begin{array}{rcl} (a) \quad \mathscr {R}[\varGamma ^*]^{\perp } = ker[\varGamma ] \quad &{} and \, hence &{} \quad \mathscr {R}[\varGamma ]^{\perp } = ker[\varGamma ^*]\\ (b) \quad \overline{\mathscr {R}[\varGamma ^*]} = ker[\varGamma ]^{\perp } \quad &{} and \, hence &{} \quad \overline{\mathscr {R}[\varGamma ]} = ker[\varGamma ^*]^{\perp }~ \end{array} \end{aligned}$$
(2.108)

from which the Projection Theorem in Hilbert space (Theorem 2.17) gives

$$\begin{aligned} \mathscr {V} = \overline{\mathscr {R}[\varGamma ^*]} \oplus ker[\varGamma ] \quad and \quad \mathscr {W} = \overline{\mathscr {R}[\varGamma ]} \oplus ker[\varGamma ^*] \end{aligned}$$
(2.109)

A following result provides an important property of a Hilbert space in terms of the range of an operator and and the kernel of its adjoint. The result has close links to the above but, more formally,

Theorem 2.10

(Denseness and the Orthogonal Complement of the Kernel) Suppose that \(~\varGamma : \mathscr {V} \mapsto \mathscr {W}\) where \(\mathscr {V}\) and \(\mathscr {W}\) are Hilbert spaces. Then the range space \(\mathscr {R}[\varGamma ^*]\) is dense in \(\mathscr {V}\) if, and only if, \(ker[\varGamma ] = \{0\}\).

Proof

If \(ker[\varGamma ] = \{0\}\), suppose that \(\mathscr {R}[\varGamma ^*]\) is not dense. It follows that its closure is a proper closed subspace S of \(\mathscr {V}\) with an orthogonal complement \(S^{\perp }\) containing non-zero vectors. Let \( v \in S^{\perp }\) be non-zero and write \(\langle v, \varGamma ^* w \rangle = 0\) for all \(w \in \mathscr {W}\). It follows that \(\langle \varGamma v, w \rangle = 0\) for all \(w \in \mathscr {W}\) so that (choosing \( w= \varGamma v\)) \( \varGamma v = 0\) which contradicts the assumption that \(ker[ \varGamma ]= \{0\}\). Next suppose that \(\mathscr {R}[\varGamma ^*]\) is dense. It follows that the condition \(\langle v, \varGamma ^* w \rangle = 0\) for all \(w \in \mathscr {W} \) implies that \(v=0\) which trivially leads to \(\mathscr {R}[\varGamma ^*] ^{\perp } = ker[\varGamma ] = \{0\}\) as required. \(\square \)

The properties of the range spaces of an operator and its adjoint are also connected as follows:

Theorem 2.11

(Closed Range Theorem) If \(~\varGamma : \mathscr {V} \rightarrow \mathscr {W}\) is a bounded linear operator between Hilbert spaces, then \(\varGamma \) has a closed range in \(\mathscr {W}\) if, and only if, the range of the adjoint \(\varGamma ^*\) is closed in \(\mathscr {V}\).

The norm of a self adjoint operator \(\varGamma : \mathscr {V} \rightarrow \mathscr {V}\) is related to the values taken by an associated quadratic form. More precisely its norm can be computed from the parameters

$$\begin{aligned} \begin{array}{rcl} a &{} = &{} \inf \{ \langle u, \varGamma u \rangle : u \in \mathscr {V}~and~\Vert u\Vert =1\} \\ and \quad b &{} = &{} \sup \{ \langle u, \varGamma u \rangle : u \in \mathscr {V}~and~\Vert u\Vert =1\}\\ to \, be \quad \Vert \varGamma \Vert &{} = &{} \max \{|a|,|b|\} \\ and, \, in~~ particular,\quad \Vert \varGamma \Vert &{} = &{} r(\varGamma ). \end{array} \end{aligned}$$
(2.110)

This expression can be written in the form, where I is the identity operator,

$$\begin{aligned} aI \le \varGamma \le bI \end{aligned}$$
(2.111)

which forms the basis of the theorem

Theorem 2.12

(Invertibility of Self Adjoint Operators) With the notation used above, suppose that \(\varGamma : \mathscr {V} \rightarrow \mathscr {V}\) is self adjoint. Then the spectrum of \(~\varGamma \) contains only real numbers in the closed interval [ab]. In particular, \(\varGamma \) has a bounded inverse if \(ab >0\).

A useful relationship valid when \(\varGamma \) is self adjoint and positive is

$$\begin{aligned} \Vert \varGamma \Vert = \sup \{ \langle u, \varGamma u \rangle : u \in \mathscr {V}~and ~~ \Vert u\Vert =1\} \end{aligned}$$
(2.112)

which immediately yields the result that, for any bounded \(\varGamma : \mathscr {V} \rightarrow \mathscr {W}\),

$$\begin{aligned} \Vert \varGamma ^*\varGamma \Vert = \Vert \varGamma \Vert ^2. \end{aligned}$$
(2.113)

A useful consequence of this is that

Theorem 2.13

(Norm of a Matrix Operator) Using the notation and assumptions of Theorem 2.7,

$$\begin{aligned} \Vert \varGamma \Vert ^2~ = \sup \{ \langle u, \varGamma ^*\varGamma u \rangle : u \in \mathscr {V} ~ and ~~ \Vert u\Vert =1\} = r(\varGamma ^*\varGamma ) \end{aligned}$$
(2.114)

where \(r(\varGamma ^*\varGamma )\) is the spectral radius of \(\varGamma ^* \varGamma = R^{-1}\varGamma ^T Q \varGamma \). Moreover, for all choices of Q and R, \(\Vert \varGamma \Vert \) is the largest singular value of \(~\varGamma _{QR} = Q^{\frac{1}{2}}\varGamma R^{-\frac{1}{2}}\) and

$$\begin{aligned} \Vert \varGamma ^*\Vert = \Vert \varGamma \Vert . \end{aligned}$$
(2.115)

Proof

The proof follows from Theorem 2.7 using Lagrange multiplier techniques to solve 2.112 regarded as a function minimization problem. Next, it is easy to see that \(\det (\lambda I_p - \varGamma \varGamma ^*) = \lambda ^{p-q}\det (\lambda I_q - \varGamma ^* \varGamma ) \) so that the eigenvalues of \(\varGamma ^* \varGamma \) and \(\varGamma \varGamma ^*\) differ, at most, only by a number of zero eigenvalues. Finally, the eigenvalues of \(\varGamma ^* \varGamma \) are the squares of the singular values of \(Q^{\frac{1}{2}}\varGamma R^{\frac{1}{2}}\) as \(\varGamma ^* \varGamma = R^{-\frac{1}{2}}\left[ \left( Q^{\frac{1}{2}}\varGamma R^{-\frac{1}{2}}\right) ^TQ^{\frac{1}{2}}\varGamma R^{-\frac{1}{2}}\right] R^{\frac{1}{2}}\). That is \(\varGamma ^* \varGamma \) and \(\varGamma ^T_{QR}\varGamma _{QR}\) are related by a similarity transformation. \(\square \)

If \(\varGamma : \mathscr {V} \mapsto \mathscr {V}\) is positive and self adjoint there exists a unique positive, self adjoint operator \(~{\hat{\varGamma }}: \mathscr {V} \mapsto \mathscr {V}\) with the property that \(\varGamma = {\hat{\varGamma }} {\hat{\varGamma }}\). For this reason, \({\hat{\varGamma }}\) is said to be the unique positive square root and is written \({\hat{\varGamma }} = \varGamma ^{1/2}\). The bounded, positive, self-adjoint linear operator \(\varGamma ^{1/2}\) has the properties that it commutes with every operator that commutes with \(\varGamma \) and

$$\begin{aligned} \begin{array}{rcl} \varGamma ^{1/2} = (\varGamma ^{1/2})^* \ge 0,&{}\quad ker[\varGamma ^{1/2}] = ker[\varGamma ] &{} \quad and\quad \mathscr {R}[\varGamma ] \subset \mathscr {R}[\varGamma ^{1/2}] \\ \end{array} \end{aligned}$$
(2.116)

so that, in particular, \(\varGamma ^{1/2}\) is positive definite if, and only if, \(\varGamma \) is positive definite.

The spectrum of a self adjoint operator \(\varGamma \) lies in the closed ball \(B_c(0;r(\varGamma )) \subset B_c(0;\Vert \varGamma \Vert )\). Using (2.110), the spectrum of \(\varGamma - \frac{b+a}{2}I\) lies in the closed ball \(B_c(0,\frac{b-a}{2})\) and hence, using the spectral mapping Theorem, the spectrum of \(\varGamma \) lies in the shifted closed ball \( \frac{b+a}{2} + B_c(0,\frac{b-a}{2})\). In particular,

Theorem 2.14

(Invertibility of \((I+\varGamma )^{-1}\)) Suppose that \(\varGamma : \mathscr {Y} \rightarrow \mathscr {Y}\) where \(\mathscr {Y}\) is a real Hilbert space is bounded, self adjoint and positive. Then \(I+\varGamma \) is a bijection and the inverse operator \((I+\varGamma )^{-1}\) is well-defined and bounded.

Proof

Using the discussion preceding this result, \(a = 0\) and \(b=\Vert \varGamma \Vert \) and hence

$$\begin{aligned} spec[\varGamma ] \subset \frac{\Vert \varGamma \Vert }{2}+ B_c(0,\frac{\Vert \varGamma \Vert }{2})~. \end{aligned}$$
(2.117)

The proof is now complete as \(-1\) is not in the spectrum of \(\varGamma \). \(\square \)

Note: Operators of this type play a central role in Iterative Algorithms.

Finally, useful conditions for \(\varGamma _1 \varGamma _2\) to be self adjoint can be stated as follows

Theorem 2.15

(When is \(\varGamma _1\varGamma _2\) self adjoint?) Suppose that the two self-adjoint operators \(\varGamma _1\) and \(\varGamma _2\) map a real Hilbert space \(\mathscr {Y}\) into itself and that \(\varGamma _2\) is positive definite. Then the product \(\varGamma _1 \varGamma _2\) is self-adjoint if the inner product in \(\mathscr {Y}\) is replaced by the new inner product

$$\begin{aligned} \langle u,v \rangle _0 = \langle u, \varGamma _2 v \rangle _{\mathscr {Y}}. \end{aligned}$$
(2.118)

The two topologies are equivalent if there exists a real scalar \(\varepsilon _0^2 > 0\) such that \(\varGamma _2 \ge \varepsilon _0^2I\).

Proof

The bilinear form \(\langle u,v \rangle _0\) satisfies all the requirements of an inner product and its associated norm \(\Vert \cdot \Vert _0\). The self-adjoint property follows as

$$\begin{aligned} \langle u,\varGamma _1 \varGamma _2 v \rangle _0 = \langle u, \varGamma _2 \varGamma _1\varGamma _2 v \rangle _{\mathscr {Y}}= \langle \varGamma _1\varGamma _2 u, \varGamma _2 v \rangle _{\mathscr {Y}}=\langle \varGamma _1 \varGamma _2u, v \rangle _0 \end{aligned}$$
(2.119)

Finally, the existence of \(\varepsilon _0^2>0\) ensures the topological equivalence of the two norms follows as

$$\begin{aligned} \varepsilon _0^2 \Vert u\Vert ^2_{\mathscr {Y}} \le \langle u,\varGamma _2 u \rangle _{\mathscr {Y}} = \Vert u\Vert _0^2 \le \Vert \varGamma _0\Vert \Vert u\Vert ^2_{\mathscr {Y}}. \end{aligned}$$
(2.120)

\(\square \)

Note: This result plays a role in the analysis of the convergence and robustness of many of the algorithms in the following chapters.

2.5 Real Hilbert Spaces, Convex Sets and Projections

The structure of real Hilbert spaces provides a powerful set of results related to optimization. These results are expressed in terms of Projection onto Convex Sets. A convex set \(S \subset \mathscr {V}\) in a real Hilbert space \(\mathscr {V}\) is any set satisfying the condition that, for any two points uv in S, the vector

$$\begin{aligned} w= \lambda u + (1 - \lambda ) v \in S \quad for ~ all ~ \lambda \in [0,1]. \end{aligned}$$
(2.121)

(where \(\lambda \) is a real number). The vector w is said to be a convex combination of u and v. The Convex Hull of a set \(S \subset \mathscr {V}\) is the smallest convex set containing S.

Suppose that \(v_0\) is an arbitrary point of \(\mathscr {V}\) and consider the problem of finding the point in a convex set S that is closest to \(v_0\). This problem can be written formally as the solution (if it exists) of the optimization problem

$$\begin{aligned} v_1 = \arg \min \{\Vert v_0 -v\Vert : v \in S \} \end{aligned}$$
(2.122)

That is, \(v_1\) is the vector \(v \in S\) that minimizes the norm \(\Vert v_0 - v \Vert \) and hence is the nearest point in S to \(v_0\). For visualization purposes, \(v_1\) can be thought of as the projection of \(v_0\) onto the set S. In general, it is possible that no such point exists but, for many problems in practice, a solution does exist. The most useful theorem characterizing the existence of \(v_1\) and its relationship to \(v_0\) is as follows:

Theorem 2.16

(Minimum Distance to a Closed Convex Set) Suppose that S is a closed, convex set in the real Hilbert space \(\mathscr {V}\). If \(v_0 \in \mathscr {V}\), then the optimization Problem (2.122) has a unique solution \(~v_1 \in S\). A necessary and sufficient condition for \(~v_1\) to be that solution is that

$$\begin{aligned} \langle v-v_1,v_1 - v_0 \rangle _{\mathscr {V}} \ge 0 \quad for ~ all ~ v \in S. \end{aligned}$$
(2.123)

A particular case of interest is when S is a closed vector subspace of \(\mathscr {V}\).

Theorem 2.17

(The Projection Theorem in Hilbert Space) Suppose that S is a closed vector subspace in the real Hilbert space \(\mathscr {V}\). If \(v_0 \in \mathscr {V}\), then the optimization Problem (2.122) has a unique solution \(~v_1 \in S\). A necessary and sufficient condition for \(~v_1\) to be that solution is that the following orthogonality condition is true,

$$\begin{aligned} \langle v-v_1,v_1 - v_0 \rangle _{\mathscr {V}} = 0 \quad for ~ all ~ v \in S. \end{aligned}$$
(2.124)

In particular, as \(v-v_1 \in S\) is arbitrary, the condition reduces to

$$\begin{aligned} \langle v,v_1 - v_0 \rangle _{\mathscr {V}} = 0 \quad for ~ all ~ v \in S~ \end{aligned}$$
(2.125)

which is simply the requirement that \(v_1 - v_0\) is orthogonal to every vector in S.

Proof

The existence and uniqueness of \(v_1\) follows from the previous theorem as does the requirement that \(\langle v-v_1,v_1 - v_0 \rangle _{\mathscr {V}} \ge 0 \) for all \(v \in S\). Suppose that there exists a vector \(v \in S\) such that \(\langle v-v_1,v_1 - v_0 \rangle _{\mathscr {V}} >0 \), then, noting that \({\hat{v}} = -v + 2v_1 \in S\), a simple computation indicates that \(\langle {\hat{v}}\,{-}\,v_1,v_1 \,{-}\, v_0 \rangle _{\mathscr {V}} < 0\) contradicting the assumption that \(v_1\) solves the problem. \(\square \)

The case of S being a vector subspace gives rise to the notion of a Projection Operator. More precisely, using the notation of the Projection Theorem 2.17, the computation \(v_0 \mapsto v_1\) defines a mapping \( P_{S}:\mathscr {V} \rightarrow S\). The orthogonality condition also indicates that the mapping is linear and hence \(P_{S}\) is a linear operator called the Orthogonal Projection Operator onto S. It is bounded as, writing \(v_1 = P_S v_0\), using the orthogonality condition, and noting that \(0 \in S\),

$$\begin{aligned} \begin{array}{rcl} \Vert v_0\Vert ^2 = \Vert (v_0 -v_1) + v_1\Vert ^2&{} = &{} \Vert v_0 - v_1\Vert ^2 + 2\langle v_0 -v_1 , v_1 \rangle + \Vert v_1\Vert ^2 \\ &{} = &{} \Vert v_0 - v_1\Vert ^2 + \Vert v_1\Vert ^2 \ge \Vert v_1\Vert ^2 \end{array} \end{aligned}$$
(2.126)

so that, together with the observation that \(P_Sv_0=v_0\) if, and only if, \(v_0 \in S\), gives

$$\begin{aligned} \Vert P_S\Vert =1~. \end{aligned}$$
(2.127)

From the definitions \(P_S^{~2} = P_S\) and hence \(ker[I-P_S] = S\). In addition, any vector \(v_0 \in \mathscr {V}\) has a unique decomposition of the form \(v_0= v_1 +(v_0-v_1) = P_Sv_0 + (I-P_S)v_0\) where \(v_1=P_Sv_0\) is orthogonal to S and hence \((I-P_S)v_0\). As a consequence, \(\mathscr {V}\) has a direct sum decomposition of the form

$$\begin{aligned} \mathscr {V} = S \oplus S^{\perp } \quad where~ S= ker[I-P_S] ~and~ S^{\perp } = ker[P_S]. \end{aligned}$$
(2.128)

In particular, for any u and v in \(\mathscr {V}\), it follows that \(\langle P_Su, v \rangle = \langle P_S u, P_Sv \rangle = \langle u, P_Sv \rangle \) so that \(P_S\) is self adjoint and positive (but not strictly positive).

Finally, another form of convex set that plays a useful role in the analysis of linear control systems is that of a Linear Variety , namely a convex set S that is constructed by a translation of a subspace \(\mathscr {W}\) . The resultant set is denoted by \(S = a + \mathscr {W}\) where \(a\in \mathscr {V}\) defines the translation and

$$\begin{aligned} S = \{v : v=a+w \quad for ~ some ~w \in \mathscr {W}~\}. \end{aligned}$$
(2.129)

Note that the choice of a is not unique as it can be replaced by any vector \(~a + w_0\) with \(w_0 \in \mathscr {W}\). As \(v-v_0 = (v-a) - (v_0 - a)\) and \(v - a \in \mathscr {V}\), the solution of the problem in this case can be expressed in the form

$$\begin{aligned} v_1 =a + P_{\mathscr {W}}(v_0 - a) \end{aligned}$$
(2.130)

A useful example of a closed linear variety is the set

$$\begin{aligned} S= \{ u: r=Gu+d\} = u_0 + \mathscr {W}, \quad with ~ \mathscr {W} = ker[G] \end{aligned}$$
(2.131)

where \(G: \mathscr {U} \rightarrow \mathscr {Y}\) is linear and bounded, \(~\mathscr {U}\) and \(\mathscr {Y}\) are real Hilbert spaces, \(r \in \mathscr {Y}\), \(d \in \mathscr {Y}\) and \(u_0 \in \mathscr {U}\) is any point (assumed to exist) satisfying \(r=Gu_0+d\). Another example of a closed linear variety is that of a closed Hyperplane defined by taking \(\mathscr {Y} = \mathscr {R}\) and G as the map \(G: u \mapsto \langle \alpha , u \rangle \) (for some \(\alpha \in \mathscr {U}\)) and setting

$$\begin{aligned} S=\{u : \langle \alpha , u \rangle = c\} \end{aligned}$$
(2.132)

where both \(\alpha \in \mathscr {U}\) and the real number c are specified. If \(u_0 \in S\), then \(S= \{ u: \langle \alpha , u - u_0 \rangle = 0\}\) which identifies the set of vectors \(u-u_0\) as that of all vectors orthogonal to \(\alpha \). A Separating Hyperplane separating two sets \(S_1\) and \(S_2\) in a real Hilbert space \(\mathscr {V}\) is a hyperplane of the above type where \(\langle \alpha , u \rangle \ge c\) for all \(u \in S_1\) and \(\langle \alpha , u \rangle \le c\) for all \(u \in S_2\) or vice versa. An example of a separating hyperplane is obtained from the result describing the minimum distance from \(v_0 \in \mathscr {V}\) to a closed convex set S. More precisely, suppose that \(v_0\) is not in S. From Theorem 2.122, the hyperplane \(\langle v - v_1, v_1 - v_0 \rangle =0\) is a hyperplane that separates the point set \(\{v_0\}\) from S. Separating hyperplanes are not unique as the hyperplane \(\langle v - \lambda v_1 - (1 - \lambda )v_0, v_1 - v_0 \rangle =0\) is also a separating hyperplane when \(\lambda \in [0,1]\).

2.6 Optimal Control Problems in Hilbert Space

Quadratic optimization problems for linear systems play a central role in this text. In general, signals will be regarded as being vectors in suitable real Hilbert spaces. Operators will be used to represent systems behaviour by providing a linear relationship between system output signals and its input signals. The minimization of objective functions created using quadratic functions of signal norms is the chosen mechanism for creating new control algorithms with known properties. The use of this level of abstraction will be seen to provide solutions that cover the solution of a wide range of problems of interest ranging from the case of continuous dynamics to sampled data systems to multi-rate systems and many other useful situations. The details are provided in the following chapters but the general form of the solution can be derived using the mathematical methods already described in this chapter.

A system will be described as a mapping between a set \(\mathscr {U}\) of input signals u and a set \(\mathscr {Y}\) of resultant output signals y. Both \(\mathscr {U}\) and \(\mathscr {Y}\) are taken to be normed vector spaces. A Linear System is characterized by a bounded, linear operator \(G: \mathscr {U} \rightarrow \mathscr {Y}\) and the input to output mapping is defined by a relation of the form

$$\begin{aligned} y=Gu+d \end{aligned}$$
(2.133)

where d represents the output behaviour when the input \(u=0\) (and hence, typically, behaviours due to initial condition and disturbances). A general form of Linear, Quadratic, Optimal Control Problem can be defined as the computation of the input that minimizes the Objective Function (often called a Performance Index or Performance Criterion )

$$\begin{aligned} J(u) = \Vert r-y\Vert ^2_{\mathscr {Y}} + \Vert u_0-u\Vert ^2_{\mathscr {U}} \end{aligned}$$
(2.134)

subject to the constraint that y and u are linked by \(y=Gu+d\). The vectors \(u_0 \in \mathscr {U}\) and \(r \in \mathscr {Y}\) are assumed to be known and the problem is interpreted as an attempt to reduce the variation of the output from the specified signal r whilst not using input signals that deviate too much from \(u_0\). The relative weighting of these two objectives is reflected in the choice of norms in \(\mathscr {Y}\) and \(\mathscr {U}\).

The solution to this problem when both \(\mathscr {U}\) and \(\mathscr {Y}\) are real Hilbert spaces is particularly valuable. Denote the adjoint operator of G by \(G^*\) and use the notation

$$\begin{aligned} \begin{array}{rcl} e= r-y = r - Gu - d \quad &{} and &{} \quad e_0 = r - y_0~ \\ &{} where &{} \quad y_0 = Gu_0 + d~ \end{array} \end{aligned}$$
(2.135)

is the output response to the input \(u_0\).

Theorem 2.18

(Solution of the Optimal Control Problem) With the problem definition given above, the input-output pair (yu) minimizing the objective function (2.134) subject to the constraint (2.133) is given by the implicit formulae

$$\begin{aligned} u = u_0 + G^* e \end{aligned}$$
(2.136)

As a consequence,

$$\begin{aligned} e= (I+GG^*)^{-1} e_0 \quad and \, hence \quad u = u_0 + G^*(I+GG^*)^{-1} e_0. \end{aligned}$$
(2.137)

In particular, the minimum value of J(u) can be computed to be

$$\begin{aligned} \min J(u) = \langle e_0, (I+GG^*)^{-1}e_0 \rangle _{\mathscr {Y}}. \end{aligned}$$
(2.138)

Proof

Two alternative proofs are provided in the next two subsections. \(\square \)

The proofs depend on the material in the previous sections. In particular, they depend on the algebraic properties of inner products, the properties of the adjoint operator, the Projection Theorem and the identities

$$\begin{aligned} \begin{array}{c} G^*(I+GG^*)^{-1} = (I+G^*G)^{-1}G^*, \quad G(I+G^*G)^{-1}G^* = GG^*(I+GG^*)^{-1}, \\ \qquad and \,(I+GG^*)^{-1} + GG^*(I+GG^*)^{-1}=I. \end{array} \end{aligned}$$
(2.139)

The inverses \((I+GG^*)^{-1}\) and \((I+GG^*)^{-1}\) exist and are bounded due to the positivity of \(GG^*\) and \(G^*G\) and the resultant lower bounds \(I+GG^* \ge I >0\) and \(I+G^*G \ge I >0\).

2.6.1 Proof by Completing the Square

Note that \(G(u-u_0) = -(e-e_0)\) and consider the following inner product

$$\begin{aligned} \begin{array}{rcl} \gamma &{} = &{} \langle u-u_0-G^*(I+GG^*)^{-1}e_0, (I+G^*G)(u-u_0-G^*(I+GG^*)^{-1}e_0) \rangle _{\mathscr {U}} \\ &{} \ge &{} \Vert u-u_0-G^*(I+GG^*)^{-1}e_0 \Vert ^2_{\mathscr {U}} ~ \ge ~ 0, \end{array} \end{aligned}$$
(2.140)

noting that it is equal to zero if, and only if \(u-u_0-G^*(I+GG^*)^{-1}e_0=0\). If this condition is valid then, operating on the equation with G gives \(e=e_0 - GG^*(I+GG^*)^{-1}e_0=(I+GG^*)^{-1}e_0\) which would prove (2.137) and hence (2.136). It remains to prove therefore that \(\gamma = 0\).

The inner product can be written as

$$\begin{aligned} \begin{array}{rl} \gamma = &{} \langle u-u_0-(I+G^*G)^{-1}G^*e_0, (I+G^*G)(u-u_0-(I+G^*G)^{-1}G^*e_0) \rangle _{\mathscr {U}} \\ = &{} \langle u-u_0, (I+G^*G)(u-u_0) \rangle _{\mathscr {U}} - 2 \langle u-u_0, G^* e_0 \rangle _{\mathscr {U}} + \langle G^*e_0, (I+G^*G)^{-1}G^*e_0 \rangle _{\mathscr {U}} \\ = &{} \Vert u-u_0\Vert ^2_{\mathscr {U}} + \Vert e - e_0\Vert ^2_{\mathscr {Y}} + 2 \langle e-e_0, e_0 \rangle _{\mathscr {Y}} + \langle G^*e_0, (I+G^*G)^{-1}G^*e_0 \rangle _{\mathscr {U}} \\ = &{} \Vert u-u_0\Vert ^2_{\mathscr {U}} + \Vert e\Vert ^2_{\mathscr {Y}} +\Vert e_0\Vert ^2_{\mathscr {Y}} - 2 \langle e,e_0 \rangle _{\mathscr {Y}} + 2 \langle e-e_0, e_0 \rangle _{\mathscr {Y}} \\ &{} \qquad \qquad \qquad \qquad + \langle G^*e_0, (I+G^*G)^{-1}G^*e_0 \rangle _{\mathscr {U}} \\ = &{} J(u) + \langle G^*e_0, (I+G^*G)^{-1}G^*e_0 \rangle _{\mathscr {U}} - \Vert e_0\Vert ^2_{\mathscr {Y}} \\ = &{} J(u) + \langle e_0, G(I+G^*G)^{-1}G^*e_0 \rangle _{\mathscr {Y}} - \Vert e_0\Vert ^2_{\mathscr {Y}} \\ = &{} J(u) + \langle e_0, \left( GG^*(I+G^*G)^{-1} -I \right) e_0 \rangle _{\mathscr {Y}} \\ = &{} J(u) - \langle e_0, (I+GG^*)^{-1}e_0 \rangle _{\mathscr {Y}} \end{array} \end{aligned}$$
(2.141)

the second term being independent of (yu). It follows that J(u) is minimized if, and only if, \(\gamma =0\). This first proof of Theorem 2.18 is now complete. \(\square \)

2.6.2 Proof Using the Projection Theorem

An alternative derivation of the solution uses the Projection Theorem in the product space \(\mathscr {Y} \times \mathscr {U}\) of input/output pairs (yu) regarded as a real Hilbert space with inner product (and associated norm) defined using

$$\begin{aligned} \begin{array}{rcl} \langle (z,v), (y,u) \rangle _{\mathscr {Y} \times \mathscr {U}} &{} = &{} \langle z,y \rangle _{\mathscr {Y}} + \langle v,u \rangle _{\mathscr {U}} \\ \Vert (y,u)\Vert _{\mathscr {Y}\times \mathscr {U}} &{} = &{} \sqrt{ \Vert y\Vert ^2_{\mathscr {Y}} + \Vert u\Vert ^2_{\mathscr {U}} }. \end{array} \end{aligned}$$
(2.142)

With this notation, \(J(u) = \Vert (r,u_0) - (y,u)\Vert ^2_{\mathscr {Y} \times \mathscr {U}}\) and the optimal control problem is that of finding the pair \((y_1,u_1)\) that solves the minimum norm problem

$$\begin{aligned} \begin{array}{rcl} (y_1,u_1) &{} = &{} \arg \min \{ \Vert (r,u_0) - (y,u)\Vert ^2 : (y,u) \in S \} \\ where \quad S &{} = &{} \{(y,u) : y=Gu+d \} \end{array} \end{aligned}$$
(2.143)

is a linear variety in \(\mathscr {Y} \times \mathscr {U}\). It is closed as any sequence \(\{(y_j,u_j)\}_{j \ge 0}\) in S converging (in the norm topology) to a point (yu) has the property that \(\{y_j\}_{j \ge 0}\) converges to y in \(\mathscr {Y}\) and \(\{u_j\}_{j \ge 0}\) converges to u in \(\mathscr {U}\). Also, as \(y_j = Gu_j+d, ~ j \ge 0\),

$$\begin{aligned} \begin{array}{rcl} 0 \le \Vert y-Gu-d\Vert _{\mathscr {Y}} &{} = &{} \Vert (y-y_j)-G(u-u_j)+( y_j-Gu_j-d )\Vert _{\mathscr {Y}} \\ &{} \le &{} \Vert (y-y_j)-G(u-u_j)\Vert _{\mathscr {Y}} + \Vert y_j-Gu_j-d \Vert _{\mathscr {Y}} \\ &{} = &{} \Vert (y-y_j)-G(u-u_j)\Vert _{\mathscr {Y}} \\ &{} \le &{} \Vert (y-y_j)\Vert _{\mathscr {Y}} + \Vert G\Vert \Vert (u-u_j)\Vert _{\mathscr {U}} \end{array} \end{aligned}$$
(2.144)

which tends to zero as \(j \rightarrow \infty \). Hence \(y=Gu+d\) which proves that \((y,u) \in S\).

Applying the projection theorem, the solution \((y_1,u_1)\) of the optimal control problem satisfies

$$\begin{aligned} \langle (z,v) -(y_1,u_1), (y_1,u_1) - (r,u_0) \rangle _{\mathscr {Y} \times \mathscr {U}}= 0 \quad for ~ all~ (z,v) \in S~. \end{aligned}$$
(2.145)

This equation is just

$$\begin{aligned} \langle z - y_1, y_1-r \rangle _{\mathscr {Y}} + \langle v - u_1, u_1-u_0 \rangle _{\mathscr {U}}= 0 \end{aligned}$$
(2.146)

Using the equations \(y_1=Gu_1+d\), \(z=Gv+d\) and \(e=r-y_1\) then gives

$$\begin{aligned} \langle v-u_1, -G^*e \rangle _{\mathscr {Y}} + \langle v-u_1, u_1-u_0 \rangle _{\mathscr {U}}= \langle v-u_1, u_1-u_0 - G^*e \rangle _{\mathscr {U}} =0 \end{aligned}$$
(2.147)

for all \(v \in \mathscr {U}\). Choosing \( v = 2u_1 - u_0 - G^* e\), it follows that \(\Vert u_1 - u_0 - G^* e\Vert ^2=0\) which proves the result using the same algebraic manipulations as those used in the previous subsection and the computation of the minimum value of the objective function as follows,

$$\begin{aligned} \Vert r-y_1\Vert ^2_{\mathscr {Y}} + \Vert u_0-u_1\Vert ^2_{\mathscr {U}} = \langle e, (I+GG^*)e \rangle _{\mathscr {Y}} = \langle e_0, (I+GG^*)^{-1}e_0 \rangle _{\mathscr {Y}}. \end{aligned}$$
(2.148)

This completes the second proof of Theorem 2.18. \(\square \)

2.6.3 Discussion

The solution of the optimal control problem described above provides a formal approach to the solution of a wide class of problems following the process summarized as the steps,

  1. 1.

    Identify the vector space \(\mathscr {U}\) from which the inputs signals are to be chosen.

  2. 2.

    Choose an inner product and norm for \(\mathscr {U}\) that ensures that it is a real Hilbert space and also reflects the physical importance of signals.

  3. 3.

    Identify the vector space \(\mathscr {Y}\) containing the outputs signals.

  4. 4.

    Choose an inner product and norm for \(\mathscr {Y}\) that ensures that it is a real Hilbert space and also reflects the physical importance of signals.

  5. 5.

    Characterize the system as a bounded linear mapping G from \(\mathscr {U}\) into \(\mathscr {Y}\) and identify the form of its adjoint operator \(G^*\).

  6. 6.

    Write the defining relationship for the optimal solution in the implicit form \(u = u_0 + G^* e\) with \(e=r-y\) and find a causal representation of this controller that can be implemented in real life.

This process could apply to any problem satisfying the assumptions but the devil is in the detail. The main problem is that expressed in the last step, namely the conversion of the implicit relationship between u and e into a useable computation. In later chapters (see for example, Sects. 3.10 and 4.7), this idea will be linked, in the special cases of linear state space models, to Two Point Boundary Value Problems and the use of Riccati equations. More generally, the computations suffer from additional complexities and high dimensionality. Even the simplest cases present challenges. For example, let \(\mathscr {U} = \mathscr {R}^q\) with inner product \(\langle {\hat{v}}, v \rangle _{\mathscr {U}} = {\hat{v}}^TRv\) (where \(R=R^T >0\)) and \(\mathscr {Y} = \mathscr {R}^p\) with inner product \(\langle {\hat{w}}, w \rangle _{\mathscr {Y}} = {\hat{w}}^TQw\) (where \(Q=Q^T >0\)). The operator G is a \(p \times q\) real matrix with adjoint (Eq. (2.104)) defined by the \(q \times p\) matrix \(~G^*= R^{-1}G^T Q\). Rather than using the implicit relationship, the direct computation of u can be undertaken using

$$\begin{aligned} u = u_0 + G^*(I+GG^*)^{-1}e_0 = u_0 + R^{-1}G^TQ(I_p+GR^{-1}G^TQ)^{-1}e_0\end{aligned}$$
(2.149)

This is a feasible approach to finding the solution and may work well in many cases but, if the dimensions p and q are large, the calculation of the inverse matrix could be challenging particularly if \(I+GR^{-1}G^TQ\) is ill-conditioned. This example does have relevance to the topics in the text associated with the description of discrete time systems in supervector form (see Sect. 4.7). Simplifications are possible in this case as the elements in G have a structural pattern that makes the inversion implicit in \(G^*(I+GG^*)^{-1}\) equivalent to the solution of a Two Point Boundary Value Problem which is solved using Riccati equation and associated simulation methods.

2.7 Further Discussion and Bibliography

The chapter has reviewed material that plays a role in the development of the techniques used in this text. Matrices form the computational core of the algorithms and, although many readers will be familiar with basic matrix algebra, an understanding of the structures and the analysis tools available for matrix methodologies is helpful as is an understanding of the way that matrices are useful in the interpretation of high (but finite) dimensional problems using simple geometrical insights generated from the familiar three-dimensional world. There are many texts that cover the material required ranging from undergraduate engineering texts such as [60, 105, 106] to essentially mathematical texts that approach the topics using both algebraic concepts and the ideas of finite dimensional vector spaces [45, 46, 53] within which matrices are representations of operators using a specified basis set. Many teaching texts on control theory and control engineering also have a summary of the necessary material [4, 39, 43, 63, 71, 81]. The material is essentially the same but differing perspectives and different levels of abstraction are used. It is useful to note that there are links between matrices and transfer function descriptions using Toeplitz matrices, details of which can be found in [51]. For the purposes of this text, an understanding of, and fluency with, the algebraic structures and analysis tools will help the reader to “see through the symbols” and concentrate more usefully on the form and meaning of the system properties used and the nature of the algorithms described. An understanding of the algebraic and computational aspects of matrix theory will form the basis for any computational software required for the exploitation of the material and also in ensuring that data formats fit the necessary matrix structures.

Relevant techniques from functional analysis are also summarized in the chapter. This material will be less familiar to many readers but, in its simplest form, it can be regarded as a generalization of matrix theory to cover a wider range of problems. In particular, matrices are replaced by operators between, possibly infinite dimensional, signal spaces and the geometry of the three dimensional world is generalized to higher, possibly infinite, dimensions. The underpinning ideas of vector spaces endowed with norms to measure signal magnitude and the notion of bounded linear operators between such spaces mirror the familiar notion of a system as a device that maps input signals into output signals and the measurement of signal magnitude using measures such as least square values or maximum magnitudes. Although much of this work can be viewed at the algebraic level as being very similar to matrix (or even transfer function) methodologies, the technical details associated with the ideas do contain many subtle issues that need to be considered at the mathematical level. These take many forms but, perhaps the most important are those of existence of solutions to defined problems, the convergence of infinite sequences, the introduction of the notion of adjoint operators and their properties, convexity and the idea of projection onto convex sets. There are many texts that provide the mathematical background for these topics including general texts on analysis [101] and functional analysis such as [12, 31, 52, 107] and more specialist texts on Hilbert space theory such as [54], operator theory [32, 33] and optimization algorithms using functional analytic methods [69]. A reference to Open Mapping and Closed Graph Theorems is found in [73] and an extensive analysis of projection methodologies is found in [36]. In the author’s experience, the choice of text to suit the needs of a particular reader depends upon that reader and his or her background and preferred way of thinking.

Finally, the content of the text is mathematical in its chosen language and much of the supporting mathematics of Laplace and Z-transforms is used extensively in control engineering texts. Some of the more advanced tools needed can be found in texts on classical and multivariable control (see Sect. 1.5), mathematical systems theory [59, 104] and specialist texts [112] and papers [75] on geometric systems theory and decoupling theory [37, 47].