Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Gradient methodologies aim to reduce the magnitude of the tracking error from iteration to iteration, the ultimate objective being to minimize that tracking error. They are, as a consequence, strongly related to the idea of optimization regarded as the minimization of some objective function subject to defined constraints. This chapter creates a new perspective on Iterative Learning Control and monotonicity beginning from the construction of optimization problems that ensure monotonic reductions of errors from iteration to iteration. Emphasis is placed on wide applicability, establishing convergence properties and convergence rates, assessing the effects of parameters and characterizing robustness to modelling errors.

The approach is structured so that familiar optimization algorithms including those used in optimal control theory can be used to provide implementable solutions. For linear, discrete or continuous, state space systems, the control laws have both feedforward and state feedback representations related to the structures seen in quadratic optimal control. However, the chapter develops and describes the approach by writing the models and algorithms in operator notation and by regarding signals as points in defined Hilbert spaces. The approach is notationally compact and provides a clear, geometrical interpretation of algorithm behaviour. In addition, although it is algebraically similar to the use of transfer function notation, it has wider applicability and demonstrates that the algorithms apply to a wide class of model types and tracking objectives. The power of the approach will be seen in the following sections and chapters by their application to

  1. 1.

    tracking of a specified reference signal at each point on a finite time interval or

  2. 2.

    finding a suitable input signal to ensure that the plant output “passes through” specified points (that is, takes required values) at selected points on the time interval of interest (the, so-called, Intermediate Point problem).

  3. 3.

    The two ideas briefly described above can be combined to solve Multi-task Tracking problems where, together with the need to pass through intermediate points, the output is required to track reference signals on one or more subintervals and plant motion may be free on other subintervals.

These applications are possible as the plant operators G are associated with adjoints \(G^*\) with simple, state space structure and realizations that can be implemented as iterative feedback and feedforward controls.

The reader will see that this chapter describes a benchmark solution to many linear tracking problems but does not necessarily include all aspects of interest in applications. The addition of more complex design requirements such as satisfying auxiliary design objectives and “hard” input/output constraints is left for later chapters including Chaps. 11 and 12.

9.1 Problem Formulation and Formal Algorithm

In what follows, input-output dynamics are described by the familiar relationship \(y=Gu+d\) with input \(u \in \mathscr {U}\) and output \(y \in \mathscr {Y}\) where \(\mathscr {U}\) and \(\mathscr {Y}\) are real Hilbert spaces. The control objective is to track a specified reference signal \(r \in \mathscr {Y}\) using an Iterative Control procedure. The tracking error is \(e=r-y\) and, as in previous chapters, signals f on iteration k are denoted by \(f_k\). Iterations are initiated by an input signal \(u_0\) generating an initial error \(e_0 = r-y_0\) with output \(y_0=Gu_0+d\) being obtained either from simulation or from experiments on the physical plant.

9.1.1 The Choice of Objective Function

Consider the problem of reducing the magnitude of a non-zero, initial error \(e_0\) by a change \(u_1-u_0\) in \(u_0\). The gradient methodologies of Chap. 7 produce reductions in error norm by choosing a suitable descent direction \(G^*e_0\) and ensuring, by choice of \(\beta \), that the magnitude of the change \(\beta G^*e_0\) in input signal is not too large. A similar effect is obtained naturally by minimizing an objective function

$$\begin{aligned} \begin{array}{cl} \underbrace{J(u,u_0) = \Vert e\Vert ^2_{\mathscr {Y}} + \varepsilon ^2 \Vert u-u_0\Vert ^2_{\mathscr {U}}}, &{} \quad with \quad e=r-y, \quad where \quad \varepsilon ^2 > 0,\\ (Objective ~~~ Function) &{} \end{array} \end{aligned}$$
(9.1)

subject to the dynamical system constraint \(y=Gu+d\).

Note: The function contains error magnitudes and the magnitude of the change in input signal. It is expressed in a quadratic form because this will lead to solutions that depend linearly on data \(u_0,~e_0\). The parameter \(\varepsilon ^2\) is added to provide a simple mechanism for varying the relative contributions of \(\Vert e\Vert ^2\) and \(\Vert u - u_0\Vert ^2\) to the objective function. Intuitively, if it is large, then the optimization will place greater emphasis on making u very similar to \(u_0\). If it is small, greater variation in u is permitted to achieve a greater reduction in the error norm \(\Vert e\Vert \).

The formal way of representing this problem is to write the minimizing input \(u_1\) as

$$\begin{aligned} u_1 = \arg \min _{u \in \mathscr {U}} \{~J(u,u_0)~:~e=r-y,~~ y=Gu+d \}~ \end{aligned}$$
(9.2)

This problem is essentially the problem discussed in Sect. 2.6 and, in particular, using the techniques of Theorem 2.18, gives the solution \(u_1=u_0+\varepsilon ^{-2}G^*e_1\) where \(e_1 = r-y_1\) and \(y_1=Gu_1+d\). As \(J(u_1,u_0) \le J(u, u_0)\) for all \(u \in \mathscr {U}\), it follows, choosing \(u=u_0\), that

$$\begin{aligned} \Vert e_1\Vert ^2_{\mathscr {Y}} + \varepsilon ^2\Vert u_1-u_0\Vert ^2_{\mathscr {U}} \le \Vert e_0\Vert ^2_{\mathscr {Y}} \quad and ~~ hence \quad \Vert e_1\Vert ^2_{\mathscr {Y}}\le \Vert e_0\Vert ^2_{\mathscr {Y}}, \quad for ~~ all ~~ \varepsilon > 0, \end{aligned}$$
(9.3)

equality holding if, and only if, \(u_1 = u_0\). In such a situation \(G^*e_0=0\), a condition that cannot hold if \(ker[G^*]=\{0\}\).

In summary, by formulating the issue of reducing error norms as a quadratic optimal control problem in Hilbert space, the input update rule deduced from the optimization guarantees a reduction in error norm and removes the need for the use of a gain parameter \(\beta \) by replacing \(e_0\) by \(e_1\).

Regarding \(u_1\) and \(e_1\) as new initial signals, application of the same process will lead to a further reduction in error norm \(\Vert e_2\Vert ^2_{\mathscr {Y}}\le \Vert e_1\Vert ^2_{\mathscr {Y}}\). This process can be continued indefinitely and, using Theorem 2.18, leads to the following general algorithm and associated properties,

Algorithm 9.1

(Norm Optimal Iterative Learning Control) Using the notation and terminology of the preceding discussion , the Norm Optimal Iterative Learning Control algorithm initiated by the input \(u_0\) generates a sequence of inputs \(\{u_k\}_{k \ge 0}\) (and associated errors \(\{e_k\}_{k \ge 0}\)) by computing

$$\begin{aligned} u_{k+1} = \arg \min _{u \in \mathscr {U}} \{~J(u,u_k)~:~e=r-y,~~ y=Gu+d \}~ \quad for \quad k \ge 0. \end{aligned}$$
(9.4)

That is, \(u_{k+1}\) is the the input that satisfies the dynamical relationships and also minimizes

$$\begin{aligned} J(u,u_k) = \Vert e\Vert ^2_{\mathscr {Y}} + \varepsilon ^2\Vert u-u_k\Vert ^2_{\mathscr {U}}, \quad with \quad e=r-y. \end{aligned}$$
(9.5)

The inputs and errors are related by the formulae

$$\begin{aligned} u_{k+1}&= u_k + \varepsilon ^{-2}G^* e_{k+1}, ~and~hence \quad e_{k+1} = L e_k, ~ k \ge 0, \nonumber \\&\quad \quad where~the~operator~\quad L=(I+\varepsilon ^{-2}GG^*)^{-1}, \end{aligned}$$
(9.6)

is self-adjoint and positive definite with \(0 < L \le I\) (a fact that follows from the monotonicity properties). In addition,

  1. 1.

    The error sequence \(\{e_k\}_{k \ge 0}\) has norms that satisfy the monotonicity conditions,

    $$\begin{aligned} \Vert e_{k+1}\Vert ^2_{\mathscr {Y}} + \varepsilon ^2 \Vert u_{k+1}-u_k\Vert ^2_{\mathscr {U}} \le \Vert e_k\Vert ^2_{\mathscr {Y}}, \quad and ~~ hence, ~~~ \Vert e_{k+1}\Vert ^2_{\mathscr {Y}}\le \Vert e_k\Vert ^2_{\mathscr {Y}}~\,\nonumber \\ \end{aligned}$$
    (9.7)

    with equality holding if, and only if, \(u_{k+1} = u_k\).

  2. 2.

    If equality holds for some value of \(k=\tilde{k}\), then \(u_k = u_{\tilde{k}}\) for all \(k \ge \tilde{k}\) and the algorithm converges in a finite number of iterations with \(\lim _{k \rightarrow \infty } e_k = e_{\infty } = e_{\tilde{k}}\).

  3. 3.

    As \(\{\Vert e_k\Vert \}_{k \ge 0}\) is positive and reducing, the limit \(\lim _{k \rightarrow \infty } \Vert e_k\Vert _{\mathscr {Y}} = E_{\infty }\) exists with \(E_{\infty } \ge 0\). The error converges to zero if, and only if, \(E_{\infty } = 0\).

  4. 4.

    If \(~ker[G^*]=\{0\}\) and \(e_0 \ne 0\), then \( 0 < \Vert e_{k+1}\Vert ^2_{\mathscr {Y}} < \Vert e_k\Vert ^2_{\mathscr {Y}}\) for all \(k \ge 0\).

    Proof If \(\Vert e_{k+1}\Vert ^2_{\mathscr {Y}} = \Vert e_k\Vert ^2_{\mathscr {Y}}\) holds for some index k, then \(u_{k+1}=u_k\) and hence \(G^*e_{k+1}=0\). It follows that \(e_{k+1}=e_k=0\) and hence \(u_k=u_{k-1}\) which leads to the conclusion that \(e_{k-1}=0\). An inductive argument then leads to the conclusion that \(e_j=0, ~~ 0 \le j \le k+1\) which contradicts the assumption that \(e_0 \ne 0\).   \(\square \)

  5. 5.

    Writing \(\Vert e_k\Vert ^2_{\mathscr {Y}} - \Vert e_{k+1}\Vert ^2_{\mathscr {Y}} \ge \varepsilon ^2 \Vert u_{k+1} - u_k\Vert ^2_{\mathscr {U}}\) for all \(k \ge 0\) and adding gives

    $$\begin{aligned} \Vert e_0\Vert ^2_{\mathscr {Y}} - E^2_{\infty } \ge \varepsilon ^2 \sum _{k=0}^{\infty }~\Vert u_{k+1} - u_k\Vert ^2_{\mathscr {U}}, \end{aligned}$$
    (9.8)

    so that \(\lim _{\rightarrow \infty } \Vert u_{k+1} - u_k\Vert =0\). That is, ultimately, the changes in input signal become infinitesimally small.

  6. 6.

    Finally, the minimum value of \(J(u,u_k)\) is \(J(u_{k+1},u_k)\) which takes the form

    $$\begin{aligned} J(u_{k+1},u_k) = \langle e_k, (I+\varepsilon ^{-2}GG^*)^{-1}e_k \rangle _{\mathscr {Y}} = \langle e_k, Le_k \rangle _{\mathscr {Y}} =\langle e_0, L^{2k+1}e_0 \rangle _{\mathscr {Y}}. \end{aligned}$$
    (9.9)

    Proof The derivation uses the defining relationships, the algebra

    $$\begin{aligned} J(u_{k+1},u_k) = \Vert e_{k+1}\Vert ^2_{\mathscr {Y}} + \varepsilon ^2\Vert \varepsilon ^{-2}G^*e_{k+1}\Vert ^2_{\mathscr {U}} = \langle e_{k+1}, (I+\varepsilon ^{-2}GG^*)e_{k+1} \rangle _{\mathscr {Y}} \end{aligned}$$
    (9.10)

    and the substitution \(e_{k+1}=Le_k\) and \(e_k= L^ke_0\).   \(\square \)

Note: For simplicity of presentation, the use of the label “Norm Optimal Iterative Learning Control” Algorithm will be abbreviated to the NOILC Algorithm.

9.1.2 Relaxed Versions of NOILC

The NOILC algorithm provides both a descent direction for \(\Vert e\Vert \) and, implicitly, a step size. Relaxation techniques retain this structure but use a variety of modifications to influence algorithm properties such as convergence rates and robustness. A relaxed version of NOILC that retains the feedback and optimization structure of the basic Algorithm 9.1 has the following form

Algorithm 9.2

(NOILC with Relaxation: Feedback Version) Using the notation of Algorithm 9.1, the relaxed version uses the modified objective function

$$\begin{aligned} J(u, \alpha u_k) = \Vert e_{k+1}\Vert ^2_{\mathscr {Y}} + \varepsilon ^2 \Vert u -\alpha u_k\Vert ^2_{\mathscr {U}} \quad with \quad 0 < \alpha \le 1, \end{aligned}$$
(9.11)

obtained by replacing \(u_k\) by \(~\alpha u_k\). Minimizing this objective function leads to the relaxed input update formula

$$\begin{aligned} u_{k+1}=\alpha u_k + \varepsilon ^{-2}G^*e_{k+1}. \end{aligned}$$
(9.12)

Using the dynamics \(~y=Gu+d~\) results in the error evolution

$$\begin{aligned} e_{k+1} = L \left( \alpha e_k +(1 - \alpha )(r-d)\right) \quad where, ~~ again, \quad L = (I+\varepsilon ^{-2}GG^*)^{-1}. \end{aligned}$$
(9.13)

The feedback interpretation is motivated by the presence of \(e_{k+1}\) in the formula for \(u_{k+1}\) whilst relaxation is represented by replacing \(u_k\) by a scaled version \(\alpha u_k\).

An alternative approach to relaxed algorithm development is as follows,

Algorithm 9.3

(NOILC with Relaxation: Feedforward Version) Using the notation of Algorithm 9.1, a feedforward, relaxed version, on iteration \(k+1\), computes a preliminary input signal \(\tilde{u}_{k+1}\) by minimizing the NOILC objective function

$$\begin{aligned} J(u,u_k) = \Vert e\Vert ^2_{\mathscr {Y}} + \varepsilon ^2 \Vert u - u_k\Vert ^2_{\mathscr {U}}. \end{aligned}$$
(9.14)

Following this minimization, the new input \(u_{k+1}\) to be applied to the plant is computed from the relaxed, feedforward, input update formula

$$\begin{aligned} u_{k+1} = \alpha u_k + \beta (\tilde{u}_{k+1} - u_k)~ with \quad 0 < \alpha \le 1, ~~and ~~ 0 < \beta \le 1 \end{aligned}$$
(9.15)

incorporating relaxation defined by the parameter \(\alpha \) plus an additional “gain” \(\beta \). Using the dynamics \(y=Gu+d\) yields

$$\begin{aligned} \tilde{u}_{k+1} - u_k = \varepsilon ^{-2}G^*(I+\varepsilon ^{-2}GG^*)^{-1}e_k ~ \end{aligned}$$
(9.16)

and hence that

$$\begin{aligned} e_{k+1} = ( (\alpha - \beta )I + \beta L ) e_k +(1 - \alpha )(r-d) \quad where,~ again, \quad L = (I+\varepsilon ^{-2}GG^*)^{-1}. \end{aligned}$$
(9.17)

The relaxation parameter \(\alpha \) plays its usual role whilst the introduction of the parameter \(\beta \) is motivated by its use in, for example inverse model and gradient algorithms as described in Chaps. 6 and 7. This link can be reinforced by writing

$$\begin{aligned} (\alpha - \beta )I + \beta L = \alpha I - \beta \varepsilon ^{-2}GG^*(I+\varepsilon ^{-2}GG^*)^{-1} \end{aligned}$$
(9.18)

which is precisely the error evolution operator expected from the relaxed algorithm \(u_{k+1}=\alpha u_k + \beta K_0 e_k\) with feedforward compensator \(K_0 = \varepsilon ^{-2}G^*(I+\varepsilon ^{-2}GG^*)^{-1} \).

The behaviour of the three algorithms defined above is governed by the spectrum of the three operators \(L,~ \alpha L\) and \((\alpha - \beta )I + \beta L\) which, using the Spectral Mapping Theorem, can be computed from the spectral properties of L.

The next two sections underpin the algorithm descriptions given above by putting them into a familiar state space system context. This is then followed by a general analysis of the properties of the algorithms using operator theory methods including an eigenstructure interpretation that sheds some light on the role and choice of \(\varepsilon \).

9.1.3 NOILC for Discrete-Time State Space Systems

With the objective of setting the algorithms of the previous section into a more familiar context , the realization of Algorithm 9.1 when the plant model G takes the form of an m-output, \(\ell \)-input, linear, time-invariant, discrete time system S(ABC) of state dimension n, operating on a finite time interval \(0 \le t \le N\), is considered. The model has the form

$$\begin{aligned} x(t+1)&= Ax(t) + B u(t), \quad 0 \le t \le N-1, \quad with \quad x(0)=x_0, \nonumber \\ and \quad ~~ y(t)&= C x(t),\quad 0 \le t \le N. \end{aligned}$$
(9.19)

The initial condition \(x_0\) is assumed to be iteration independent and the objective of Iterative Control is to find an input \(u_{\infty }\) that tracks, exactly, a specified reference signal r defined by the time series \(r(t),~ 0 \le t \le N\). The model is most easily discussed by using supervector terminology and identifying the input and output spaces as \(\mathscr {U}=\mathscr {R}^{\ell (N+1)}\) and \(\mathscr {Y}=\mathscr {R}^{m(N+1)}\) respectively with inner products

$$\begin{aligned} \langle u, v \rangle _{\mathscr {U}} = \sum _{j=0}^{N}~ u^T(t)R(t)v(t) \quad and \quad \langle y, w \rangle _{\mathscr {Y}} = \sum _{j=0}^{N}~ y^T(t)Q(t)w(t) \end{aligned}$$
(9.20)

where the time varying weight matrices \(R(t) = R^T(t) >0\) and \(Q(t)=Q^T(t) >0\) for \(~ t = 0,1,2, \ldots , N\).

With the above definitions, Algorithm 9.1 becomes

Algorithm 9.4

(NOILC for Discrete Time, State Space Systems) Using the notation given above, suppose that, on iteration k, the input \(u_k\) was used and generated output and error time series \(y_k\) and \(e_k = r- y_k\). NOILC Algorithm 9.1 then constructs the input time series \(u_{k+1}(t),~ 0 \le t \le N,\) to be used on iteration \(k+1\) as the one that minimizes the quadratic objective function

$$\begin{aligned}&J(u,u_k)\nonumber \\&\quad = \sum _{j=0}^{N}\left( \!\!(r(t){-} y(t))^TQ(t)(r(t)-y(t)) \,{+}\, \varepsilon ^2 (u(t){-}u_k(t))^T R(t)(u(t) {-} u_k(t)) \! \right) . \end{aligned}$$
(9.21)

This input is applied to the plant to generate data \(y_{k+1}\) and \(e_{k+1}\), the index is updated and the process repeated indefinitely.

The optimal control problem is that of Sect. 4.7 with R(t) replaced by \(\varepsilon ^2 R(t)\). The solution can be implemented in either a feedback or feedforward form.

Two Feedback Implementations: Two feedback implementations can be considered. In the feedback Implementation One, the control signal has the form

$$\begin{aligned} u_{k+1}(t) = u_k(t) + \varepsilon ^{-2}R^{-1}(t) B^T \left( -K(t)x_{k+1}(t) + \xi _{k+1}(t) \right) , \quad 0 \le t \le N, \end{aligned}$$
(9.22)

where \(x_{k+1}(t)\) is the measured state vector at time t on iteration \(k+1\). In addition,

  1. 1.

    the (iteration independent) \(n \times n\) state feedback matrices \(K(t),~ 0 \le t \le N\) are computed off-line before initialization of the iterative procedure using the nonlinear recursion

    $$\begin{aligned} \tilde{K}(t+1)&= A^TK(t+1) + C^T Q(t+1)C, \nonumber \\ K(t)&= \left( I+ \varepsilon ^{-2}\tilde{K}(t+1)BR^{-1}(t)B^T \right) ^{-1} \tilde{K}(t+1) A \end{aligned}$$
    (9.23)

    starting from the terminal condition \(K(N)=0\).

  2. 2.

    The predictive term \(\xi _{k+1}(t),~ 0 \le t \le N\), is computed from the recursion, \(0 \le t \le N-1\),

    $$\begin{aligned} \xi _{k+1}(t)&= \left( I+ \varepsilon ^{-2}\tilde{K}(t+1)BR^{-1}(t)B^T \right) ^{-1} \psi _{k+1}(t), \quad ~~~ where, \nonumber \\ \psi _{k+1}(t)&= A^T\xi _{k+1}(t+1) -\tilde{K}(t+1)Bu_k(t) + C^TQ(t+1) r(t+1), \end{aligned}$$
    (9.24)

    beginning with the terminal condition \(\xi _{k+1}(N)=0\).

Note that, whereas the state feedback matrices are computed only once and used on all iterations, it is necessary to compute the predictive term \(\xi _{k+1}\) for each iteration. It is computed off-line, using a plant model, in the time between iteration k and \(k+1\) when the system is being reset for the next iteration. The feedback term provides performance data from the current state of the system on iteration \(k+1\) whilst the term \(\xi _{k+1}\) feeds information forward from iteration k.

A alternative approach uses \(e_k\) in the calculations. An approach to this is deduced by writing the input update formula in the form

$$\begin{aligned} u_{k+1} - u_k = \varepsilon ^{-2}G^*e_{k+1} = \varepsilon ^{-2}G^*Le_k. \end{aligned}$$
(9.25)

The second term \(\varepsilon ^{-2}G^*Le_k\) is identical to the control input computed using one iteration of NOILC for the system \(y-y_k = G(u-u_k)\) from the starting condition of zero input (\(u=u_k\)) and a reference signal equal to \(e_k\). Noting that the state for the system is \(x(t) - x_k(t)\), this leads to what will be called the feedback Implementation Two which generates the input signal using

$$\begin{aligned} u_{k+1}(t) = u_k(t) + \varepsilon ^{-2}R^{-1}(t) B^T \left( -K(t)\underbrace{\left( x_{k+1}(t)-x_k(t) \right) } + \xi _{k+1}(t) \right) , \end{aligned}$$
(9.26)

where the feedback gain K(t) now operates on the difference \(x_{k+1}(t)-x_k(t)\) in state vectors between iterations. The matrices \(\{K(t)\}_{0 \le t \le N}\) remain unchanged, but the predictive term \(\xi _{k+1}(t),~ 0 \le t \le N\), is computed from the modified recursion, \(0 \le t \le N-1\),

$$\begin{aligned} \xi _{k+1}(t)&= \left( I+ \varepsilon ^{-2}\tilde{K}(t+1)BR^{-1}(t)B^T \right) ^{-1} \psi _{k+1}(t), \quad ~~~ with, \nonumber \\ \psi _{k+1}(t)&= A^T\xi _{k+1}(t+1) + C^TQ(t+1) \underbrace{e_k(t+1)},\quad ~~~\xi _{k+1}(N)=0. \end{aligned}$$
(9.27)

The two feedback implementations differ in their use of data. Both implementations use current iteration data in the form of \(x_{k+1}(t)\) with previous iteration performance being represented by \(u_k(t)\) and, in Implementation Two, via the state \(x_k(t)\) and the presence of the error \(e_k(t)\) in the equations for \(\psi _{k+1}\). Intuitively, Implementation Two contains the greatest link to iteration k but, mathematically, the two are identical in the absence of modelling errors.

Feedforward Implementation: Given the data \(r,u_k,e_k\), the feedforward implementation of NOILC for discrete time state space systems is simply stated as the use of models to calculate the new input \(u_{k+1}\), off-line, in the time between iterations k and \(k+1\). These calculations can be approached using the formulae given for the feedback case above. Following its evaluation, \(u_{k+1}\) is applied to the plant and new data \(e_{k+1}\) obtained from sensors to allow iterations to continue.

There is, however, an important practical issue implicit in these comments, namely, that computations using the first feedback implementation in the form described above use only r and \(u_k\) to generate \(u_{k+1}\). In a feedforward computation, this calculation therefore ignores the actual behaviour of the plant as there is no feedback from the previously observed error \(e_k\). In effect, the algorithm is ignoring the external reality which plays no role in the computations at all! In contrast, the use of equations for Implementation Two includes measurements of the error data \(e_k(t)\) and provides the necessary link.

9.1.4 Relaxed NOILC for Discrete-Time State Space Systems

Continuing with the discussion of the previous section, the relaxed versions of the algorithm has two forms.

Feedback Relaxation: Using the feedback relaxation Algorithm 9.2, a relaxed version of Algorithm 9.4 gives the input update in the form \(u_{k+1}=\alpha u_k + \varepsilon ^{-2}G^* e_{k+1}\).

  1. 1.

    Using the equations for Implementation One, a feedback implementation is obtained by modifying the equations for \(\psi _{k+1}\) by the substitution

    $$\begin{aligned} u_k(t) ~\mapsto ~ \alpha u_k(t). \end{aligned}$$
    (9.28)
  2. 2.

    For Implementation Two, the dynamics are written in the form \(y-\alpha y_k = G(u - \alpha u_k) +(1-\alpha )d\) which has a state space model S(ABC) with input \(u(t)- \alpha u_k(t)\), output \(y(t)- \alpha y_k(t)\), state \(x(t)- \alpha x_k(t)\) and initial condition \((1- \alpha ) x_0\). Writing the performance index in the form

    $$\begin{aligned} J(u,u_k)&= \Vert r - \alpha y_k-(y - \alpha y_k)\Vert ^2_{\mathscr {Y}} + \varepsilon ^2 \Vert u - \alpha u_k\Vert ^2_{\mathscr {U}}, \quad gives \nonumber \\ u_{k+1}(t) - u_k(t)&= \varepsilon ^{-2}R^{-1}(t) B^T \left( -K(t)\underbrace{\left( x_{k+1}(t)-\alpha x_k(t) \right) } + \xi _{k+1}(t) \right) , \end{aligned}$$
    (9.29)

    where \(\{\xi _{k+1}(t)\}\) comes from the equations for \(\psi _{k+1}\) using the substitutions

    $$\begin{aligned} u_{k}(t) \mapsto 0 \quad and \quad r(t) \mapsto r - \alpha y_k. \end{aligned}$$
    (9.30)

    This gives the modified equation, with \(\xi _{k+1}(N)=0\),

    $$\begin{aligned} \xi _{k+1}(t)&= \left( I+ \varepsilon ^{-2}\tilde{K}(t+1)BR^{-1}(t)B^T \right) ^{-1} \psi _{k+1}(t), \quad ~~~ with, \nonumber \\ \psi _{k+1}(t)&= A^T\xi _{k+1}(t+1) + C^TQ(t+1) \underbrace{\left( r(t+1)-\alpha y_k(t+1) \right) }. \end{aligned}$$
    (9.31)

Feedforward Relaxation Using the feedback relaxation Algorithm 9.3, a relaxed version of Algorithm 9.4 is obtained by computing, off-line, the feedforward implementation value of the new input, denoting it by \(\tilde{u}_{k+1}\), and subsequently computing the new input to be applied to the plant from the formula

$$\begin{aligned} u_{k+1}(t) = \alpha u_k(t) + \beta \left( \tilde{u}_{k+1}(t) - u_k(t)\right) , \quad for \quad 0 \le t \le N. \end{aligned}$$
(9.32)

9.1.5 A Note on Frequency Attenuation: The Discrete Time Case

Error evolution is governed by the properties of the operator L . The most precise physical interpretation of this uses eigenvector and eigenvalue analysis but the robustness analysis of later sections will link robustness properties to the frequency domain and the transfer function matrix G(z). The methodology and notation of Sect. 7.2.3 can be applied to the equation \(e_{k+1}=Le_k\) in the frequency domain by supposing that \(e_k = W_j(z_k)\) and noting that \(e_{k+1}\) can then be approximated, if N is large, by

$$\begin{aligned} e_{k+1} \approx (1 + \varepsilon ^{-2}\sigma ^2_j(z_k))^{-1}W_j(z_k), \end{aligned}$$
(9.33)

which links the evolution of individual frequency components in the error in terms of the eigenstructure of \(G(z)R^{-1}G^T(z^{-1})Q\). It is an approximation as initial and terminal conditions for G and \(G^*\) are neglected. It does however provide support for the intuitive idea that individual frequency components are influenced by the frequency domain properties of G(z). In particular, it suggests that

  1. 1.

    for low pass systems, high frequency error components are attenuated slowly,

  2. 2.

    frequency content in the vicinity of resonances are attenuated more severely than other frequency ranges and,

  3. 3.

    if, G(z) has a zero close to the unit circle, frequencies close to that zero will be attenuated very slowly.

Future development to choice of weights Q and R could be based on their effects on the eigenvalues \(\sigma ^2_j(z_k)\) (see Theorem 9.20).

9.1.6 NOILC: The Case of Continuous-Time State Space Systems

The form of the NOILC algorithm for a linear , time-invariant, m-output, \(\ell \)-input, continuous time system S(ABC) is constructed in much the same way as that for the discrete time case in Sects. 9.1.3 and 9.1.4. The dynamics on [0, T] are

$$\begin{aligned} \dot{x}(t)&= Ax(t) + B u(t), ~~&~~ with \quad x(0)=x_0, \nonumber \\ and \quad ~~ y(t)&= C x(t),~&~~ 0 \le t \le T. \end{aligned}$$
(9.34)

The initial condition \(x_0\) is assumed to be iteration independent and the objective of Iterative Control is to find an input \(u_{\infty }\) that tracks, exactly, a specified reference signal \(r \in \mathscr {Y}\) defined by the vector function \(r(t) \in \mathscr {R}^m,~ 0 \le t \le T\). The input and output spaces are \(\mathscr {U}=L_2^{\ell }[0,T]\) and \(\mathscr {Y}=L_2^m[0,T]\) with inner products

$$\begin{aligned} \langle u, v \rangle _{\mathscr {U}} = \int _{0}^{T}~ u^T(t)R(t)v(t)dt \quad and \quad \langle y, w \rangle _{\mathscr {Y}} = \int _{0}^{T}~ y^T(t)Q(t)w(t)dt \end{aligned}$$
(9.35)

where the matrices \(R(t) = R^T(t) >0\) and \(Q(t)=Q^T(t) >0\) for all \(~ t \in [0,T]\).

Algorithm 9.5

(NOILC for Continuous Time, State Space Systems) Using the notation given above, suppose that, on iteration k, the input \(u_k\) was used and generated a measured output and error signals \(y_k\) and \(e_k = r- y_k\). NOILC Algorithm 9.1 then constructs the input signal \(u_{k+1}(t),~ 0 \le t \le T,\) to be used on iteration \(k+1\) as the one that minimizes the quadratic objective function

$$\begin{aligned} J(u,u_k)&= \int _{0}^{T}~((r(t)-y(t))^TQ(t)(r(t)-y(t)) \nonumber \\&\quad \quad \quad +\, \varepsilon ^2 (u(t)-u_k(t))^TR(t)(u(t)-u_k(t))dt. \end{aligned}$$
(9.36)

This input is then applied to the plant to generate data \(y_{k+1}\) and \(e_{k+1}\), the index is updated and the process repeated indefinitely.

The computations are precisely those needed to solve the optimization problem discussed in Sect. 3.10 with R(t) replaced by \(\varepsilon ^2 R(t)\). By transforming the input update rule \(u_{k+1}=u_k+\varepsilon ^{-2}G^*e_{k+1}\) into a two-point boundary value problem, the solution can be implemented in either a feedback or feedforward form.

Feedback Implementation: There are two feedback implementations. In the feedback Implementation One, the results of Sect. 3.10.4 indicate that the control signal has the form

$$\begin{aligned} u_{k+1}(t) = u_k(t) + \varepsilon ^{-2} R^{-1}(t) B^T \left( - K(t)x_{k+1}(t) + \xi _{k+1}(t) \right) , \quad 0 \le t \le T, \end{aligned}$$
(9.37)

where \(x_{k+1}(t)\) is the measured state vector at time t on iteration \(k+1\). In addition,

  1. 1.

    the (iteration independent) \(n \times n\), time varying, state feedback matrix \(K(t),~ 0 \le t \le T\), is computed off-line before initialization of the iterative procedure by solving the nonlinear matrix differential equation

    $$\begin{aligned} \frac{dK(t)}{dt} + A^TK(t) + K(t)A - \varepsilon ^{-2} K(t)BR^{-1}(t)B^TK(t) + C^TQ(t)C = 0 \end{aligned}$$
    (9.38)

    with the terminal condition \(K(T)=0\).

  2. 2.

    The predictive term \(\xi _{k+1}(t),~ 0 \le t \le T\), is computed from the terminal condition \(\xi _{k+1}(T)=0\) and the differential equation on [0, T],

    $$\begin{aligned} {}\frac{d\xi _{k+1}(t)}{dt} = -\left( \!A^T - \varepsilon ^{-2} K(t)BR^{-1}(t)B^T\!\right) \xi _{k+1}(t) {-} C^TQ(t)r(t) {+} K(t)B u_k(t). \end{aligned}$$
    (9.39)

Again, K(t) is computed only once and used on each and every iteration but it is necessary to recompute the predictive term \(\xi _{k+1}\) for each iteration. It is computed off-line in the time between iteration k and \(k+1\) when the system is being reset.

In feedback Implementation Two, the ideas expressed in Sect. 9.1.3 can again be applied to replace the control law by

$$\begin{aligned} u_{k+1}(t) = u_k(t) + \varepsilon ^{-2} R^{-1}(t) B^T \left( - K(t)(x_{k+1}(t)-x_k(t)) + \xi _{k+1}(t) \right) , \end{aligned}$$
(9.40)

where, using the same terminal condition,

$$\begin{aligned} \frac{d\xi _{k+1}(t)}{dt} = -\left( A^T - \varepsilon ^{-2} K(t)BR^{-1}(t)B^T\right) \xi _{k+1}(t) - C^TQ(t)e_k(t). \end{aligned}$$
(9.41)

Intuitively, this version will suit applications better as \(\xi _{k+1}\) responds directly to the measured error data rather than the input used.

Feedforward Implementation: Given the data \(r,u_k,e_k\), the feedforward implementation of NOILC for continuous time state space systems is simply stated as the use of models to calculate the new input \(u_{k+1}\), off-line, in the time between iterations k and \(k+1\). These calculations can be approached using the formulae given for the feedback case above. Following its evaluation, \(u_{k+1}\) is applied to the plant and new data \(e_{k+1}\) obtained from sensors to allow iterations to continue. The discussion of the discrete case in Sect. 9.1.3 applies here. That is, in order to make the algorithm respond, explicitly, to the previously observed error \(e_k\), the computations associated with Implementation Two seem to be the most appropriate.

Finally, the relaxed versions of the algorithm are simply stated as follow:

Feedback Relaxation: Using the feedback relaxation Algorithm 9.2 and a similar analysis to that of Sect. 9.1.4, a relaxed version of Algorithm 9.5 is obtained in two forms. Implementation One replaces \(u_k\) by \(\alpha u_k\) to give

$$\begin{aligned} u_{k+1}(t)&= \alpha u_k(t) + \varepsilon ^{-2} R^{-1}(t) B^T \left( - K(t)x_{k+1}(t) + \xi _{k+1}(t) \right) , \quad and \nonumber \\ \frac{d\xi _{k+1}(t)}{dt}&= -\left( \!A^T - \varepsilon ^{-2} K(t)BR^{-1}(t)B^T\!\right) \xi _{k+1}(t) {-} C^TQ(t)r(t){+} \alpha K(t)B u_k(t)~ \end{aligned}$$
(9.42)

with the terminal condition \(\xi _{k+1}(T)=0\). In contrast, Implementation Two uses

$$\begin{aligned} u_{k+1}(t)&= \alpha u_k(t) + \varepsilon ^{-2} R^{-1}(t) B^T \left( - K(t)(x_{k+1}(t)-x_k(t)) + \xi _{k+1}(t) \right) , \quad and \nonumber \\ \frac{d\xi _{k+1}(t)}{dt}&~= -\left( A^T - \varepsilon ^{-2} K(t)BR^{-1}(t)B^T\right) \xi _{k+1}(t) - C^TQ(t)\left( r(t) - \alpha y_k(t)\right) ~ \end{aligned}$$
(9.43)

which is driven by the measured output \(y_k\).

Feedforward Relaxation: Using the feedback relaxation Algorithm 9.3, a relaxed version of Algorithm 9.5 is obtained by computing, off-line, the input \(\tilde{u}_{k+1}\) generated by Implementation Two of Algorithm 9.5. The new input to be applied to the plant is then obtained from the formula

$$\begin{aligned} u_{k+1}(t) = \alpha u_k(t) + \beta \left( \tilde{u}_{k+1}(t) - u_k(t)\right) , \quad for \quad 0 \le t \le T. \end{aligned}$$
(9.44)

9.1.7 Convergence, Eigenstructure, \(\varepsilon ^2\) and Spectral Bandwidth

Sections 9.1.3 and 9.1.6 provide simple computations for practical implementations . They carry no obvious information on the effects of the reference signal r or the objective function weights Q and R on algorithm performance. There is no explicit relationship available but a useful insight can be obtained by assuming an eigenstructure for \(GG^*\) and examining the convergence in terms of the evolution of the eigenvector components of the error. A sufficient condition for such an eigenstructure to exist is that \(\mathscr {Y}\) is finite dimensional as, for example, in the case of discrete state space models.

Section 5.2.4 has revealed the potential power of eigenstructure in iterative analysis. The main value of the results was in increasing the understanding of iteration behaviour. The techniques are not aimed at practical computation as, in practice, computation of the eigenvalues and eigenvectors of high or infinite dimensional operators is a difficult or impossible task. In this section, the assumption of an eigenstructure for \(GG^*\) is used to create a greater understanding of the “internal dynamics” of the NOILC algorithms. Note that

$$\begin{aligned} \mathscr {Y} = ker[G^*] \oplus \overline{\mathscr {R}[GG^*]} \end{aligned}$$
(9.45)

is an orthogonal subspace decomposition of \(\mathscr {Y}\) and \(ker[G^*]\) is the eigenspace of \(GG^*\) corresponding to zero eigenvalues. By construction, \(GG^*\) maps \(\overline{\mathscr {R}[GG^*]}\) into the dense subspace \(\mathscr {R}[GG^*]\) of \(\overline{\mathscr {R}[GG^*]}\). Assume that \(GG^*: \overline{\mathscr {R}[GG^*]} \rightarrow \overline{\mathscr {R}[GG^*]}\) has eigenvalues \(\{\sigma _j^2\}_{j \ge 1}\) with corresponding orthonormal eigenvectors \(\{v_j\}_{j \ge 1}\) that span \(\overline{\mathscr {R}[GG^*]}\). Then, by construction, \(\sigma _j^2 >0, ~~ j \ge 1\) and, by reordering,

$$\begin{aligned} \sigma ^2_1 \ge \sigma ^2_2 \ge \sigma ^2_3 \ge \cdots ~ \quad (where ~~multiplicities ~~ are~~permitted). \end{aligned}$$
(9.46)

As \(GG^*\) is self-adjoint,

$$\begin{aligned} r(GG^*)=\Vert GG^*\Vert =\Vert G^*\Vert ^2=\sigma ^2_1. \end{aligned}$$
(9.47)

Consider now the operator \(L=(I+\varepsilon ^{-2}GG^*)^{-1}\) and note that \(Le=e\) for all \(e\in ker[G^*]\) and that \(Lv_j = (1+\varepsilon ^{-2}\sigma ^2_j)^{-1}v_j\) for all \(j \ge 1\).

Consider Algorithm 9.1 with initial error \(e_0\) decomposed into the sum \(e_0 = e^{(1)}_0+e^{(2)}_0\) with \(e^{(1)}_0 \in ker[G^*]\) and \(e^{(2)}_0 \in \overline{\mathscr {R}[GG^*]}\). Then, writing \(e^{(2)}_0 = \sum _{j \ge 1} ~\gamma _j v_j\),

$$\begin{aligned} \Vert e_0^{(2)}\Vert ^2_{\mathscr {Y}} = \sum \nolimits _{j \ge 1}~\gamma ^2_j,&~ ~~so ~~ that \quad \lim \nolimits _{j \rightarrow \infty } \gamma _j =0 \nonumber \\ and ~~~\quad e_k = L^k e_0&= e^{(1)}_0 + \sum \nolimits _{j \ge 1} ~\underbrace{(1+\varepsilon ^{-2}\sigma ^2_j)^{-k}\gamma _j v_j}. \end{aligned}$$
(9.48)

The simple observations suggested by this analysis include,

  1. 1.

    The component of \(e_0\) in \(ker[G^*]\) remains unchanged from iteration to iteration.

  2. 2.

    The contribution of the eigenvector \(v_j\) decreases in significance from iteration to iteration at a rate governed by the power rule \((1+\varepsilon ^{-2}\sigma ^2_j)^{-k}, ~~ k \ge 0\). Hence, defining \(\sigma ^2_{\infty } = \inf _{j \ge 1}~\sigma _j^2\),

    1. a.

      if \(\sigma ^2_{\infty } >0\), then \(\Vert e_k - e_0^{(1)}\Vert \le (1+\varepsilon ^{-2}\sigma ^2_{\infty })^{-k} \Vert e_0 - e_0^{(1)}\Vert \) so that \(e_0\) converges to the orthogonal projection \(e_0^{(1)}\) of \(e_0\) onto \(\overline{\mathscr {R}[GG^*]}\).

    2. b.

      If \(\sigma ^2_{\infty } = 0\), the convergence properties are retained but the magnitude of the contribution of eigenvectors corresponding to eigenvalues \(\sigma ^2_j \ll \varepsilon ^2\) reduces infinitesimally slowly. This observation provides insight into the effect of the reference signal. More precisely, rapid convergence (faster than some power rule \((1+\delta ^2)^{-k}\), with \(\delta ^2 > 0\)) to small errors will only occur if \(e_0^{(1)}=0\) and r is dominated by the contribution of eigenvectors with eigenvalues \(\sigma ^2_j\) similar in magnitude to or greater than \(\varepsilon ^2 \delta ^2\). It is important to note that the number of reference signals in this set increases as \(\varepsilon ^2\) reduces suggesting that fast convergence is then achieved for a wider class of reference signals.

    3. c.

      In many situations, the eigenvectors corresponding to very small eigenvalues will be associated with “high frequency” characteristics of the plant. This idea is supported by the state space example in Sect. 5.2.5 where eigenfunctions are associated with terms in a Fourier series representation. This can be a good guide to physical behaviour but it does not tell the whole story as is seen by noting, Sect. 8.2.4, that non-minimum-phase behaviours are associated with infinitesimally small eigenvalues but are not a high frequency phenomenon.

    4. d.

      The signal with the fastest convergence to zero are those in the subspace spanned by eigenvectors with eigenvalues equal to \(\sigma _1^2 = \Vert GG^*\Vert \). That is, the choice of inner products in \(\mathscr {Y}\) and \(\mathscr {U}\) (which influence the form of \(G^*\)) also influence convergence rates.

  3. 3.

    Finally, the norm and spectral radius can be computed as follows,

    1. a.

      \(r(L)=\Vert L\Vert =1\) if \(ker[G^*] \ne \{0\}\).

    2. b.

      If \(ker[G^*] = \{0\}\), then \(r(L)=\Vert L\Vert = (1+\varepsilon ^{-2}\sigma ^2_{\infty })^{-1}\).

    In case (b), if \(\sigma ^2_{\infty } > 0\), the convergence of the algorithm to zero error is guaranteed. If, however, \(\sigma ^2_{\infty }=0\) then the reader can verify that \(e_k \rightarrow 0\) in the weak topology in \(\mathscr {Y}\). A proof of convergence in norm is left for the next section.

These observations indicate a link between any eigenstructure of \(GG^*\) and convergence rates. In the following paragraphs, a simple parametric characterization of the link between convergence and the spectrum of \(GG^*\) is suggested as an aid to design discussions and the choice of \(\varepsilon ^2\).

Definition 9.1

(The Concept of Spectral Bandwidth) Suppose that \(GG^*\) has an eigenstructure as discussed in the preceding paragraphs and that \(e_0=\sum _{j \ge 1}~\gamma _j v_j\). Then, given two real numbers \(\lambda \) and \(\mu \) in the half-open interval (0, 1], the NOILC Iterative Control algorithm is said to have a Spectral Bandwidth \(S_{BW}(\lambda , \mu )\) if, and only if, the contribution, to the error signal, of all eigenvectors \(v_j\) with eigenvalues \(\sigma _j^2 \ge \lambda \Vert G^*\Vert ^2\) decay at a rate bounded from above by the geometric sequence \(\mu ^k\gamma _j\).

As \(e_k=L^ke_0\), it follows that \(e_k=\sum _{j \ge 1}~(1+\varepsilon ^{-2}\sigma ^2_j)^{-k}\gamma _j v_j\) and hence a spectral bandwidth \(S_{BW}(\lambda , \mu )\) is achieved if

$$\begin{aligned} (1+\varepsilon ^{-2}\lambda \Vert G^*\Vert ^2)^{-1} \le \mu ~ \quad for ~~ the ~~ chosen \quad \varepsilon ^2 > 0. \end{aligned}$$
(9.49)

This condition has application to the choice of \(\varepsilon ^2\) for a given choice of \(\lambda \) and \(\mu \).

  1. 1.

    If attention is focussed on the convergence rate of \(\sigma _1^2\) only, then \(\lambda = 1\). If it is seen as desirable that this eigen-component at least halves in magnitude from iteration to iteration, then it must converge faster than \(\left( \frac{1}{2}\right) ^k\) and the choice of \(\mu = 0.5\) is appropriate. The spectral bandwidth \(S_{BW}(1, 0.5)\) is then achieved for any choice of \(0 < \varepsilon ^2 \le \Vert G^*\Vert ^2\).

  2. 2.

    If attention is focussed on the convergence rate of eigenvectors with eigenvalues greater than \(\frac{1}{2}\sigma _1^2\) only, then \(\lambda = 0.5\). If it is seen as desirable that all such eigen-components converge faster than \(\left( \frac{1}{2}\right) ^k\), choose \(\mu = 0.5\). The spectral bandwidth \(S_{BW}(0.5, 0.5)\) is then achieved for any choice of \(0 < \varepsilon ^2 \le 0.5 \Vert G^*\Vert ^2\).

  3. 3.

    If attention is focussed on the convergence rate of eigenvectors with eigenvalues greater than \(\frac{1}{2}\sigma _1^2\) only, then \(\lambda = 0.5\). If it is seen as desirable that all such eigen-components converge faster than \(\left( \frac{1}{3}\right) ^k\), choose \(\mu = 0.33\). The spectral bandwidth \(S_{BW}(0.5, 0.33)\) is then achieved for any choice of \(0 < \varepsilon ^2 \le 0.25 \Vert G^*\Vert ^2\).

Note that achieving a specific spectral bandwidth \(S_{BW}(\lambda , \mu )\) does not imply that error convergence satisfies \(\Vert e_k\Vert \le \mu ^k \Vert e_0\Vert \). The precise form of convergence depends on the initial error \(e_0\) and hence r and \(u_0\). Loosely speaking, such convergence is achieved approximately for any \(e_0\) whose eigenvector representation is dominated by eigenvectors with eigenvalues in the range \(\lambda \Vert G^*\Vert ^2 \le \sigma _j^2 \le \Vert G^*\Vert ^2\). A more precise description is obtained by assuming that a spectral bandwidth \(S_{BW}(\lambda , \mu )\) has been achieved and then writing

$$\begin{aligned} e_0 = \sum _{\sigma _j^2 \ge \lambda \Vert G^*\Vert ^2} \gamma _j v_j + \sum _{\sigma _j^2 < \lambda \Vert G^*\Vert ^2} \gamma _j v_j = e_0^{(1)} + e_0^{(2)} \end{aligned}$$
(9.50)

with the natural identification of terms. It follows that,

$$\begin{aligned} \Vert e_k\Vert _{\mathscr {Y}} \le \mu ^k\Vert e^{(1)}_0\Vert _{\mathscr {Y}} + \Vert e^{(2)}_0\Vert _{\mathscr {Y}} \end{aligned}$$
(9.51)

and hence that the error sequence convergence can be thought of as an initial convergence following the power law \(\mu ^k\) to a closed ball centred on the origin of radius \(\Vert e^{(2)}_0\Vert _{\mathscr {Y}}\). This is followed by slower convergence to its final limit. If \(\Vert e^{(2)}_0\Vert _{\mathscr {Y}} \ll \Vert e_0\Vert ^2_{\mathscr {Y}}\) then the algorithm achieves accurate tracking quickly. Note that reducing \(\lambda \) with \(\mu \) fixed will (a) require a reduction in the value of \(\varepsilon ^2\) and (b) include more terms in the expression for \(e_0^{(1)}\) and hence reduce \(\Vert e^{(2)}_0\Vert _{\mathscr {Y}}\).

Finally, a good choice of \(\varepsilon ^2\) is clearly related to desired convergence properties and the norm \(\Vert G^*\Vert ^2\). For asymptotically stable, linear, time invariant, discrete, state space systems S(ABCD) for example, previous calculations in Sect. 4.8.2 have proved that \(\Vert G^*\Vert ^2 = \Vert G\Vert ^2\) and provided a bound described by the spectral radius \(r\left( R^{-1}G^T(z^{-1})QG(z) \right) \) that is accurate if N is large.

9.1.8 Convergence: General Properties of NOILC Algorithms

The previous section has demonstrated the power of an eigenstructure interpretation of algorithm behaviour. However, the form of L and the fact that it is self adjoint makes it possible to make useful statements about its spectral radius and norm and hence the convergence properties of the algorithm without appealing to or assuming any eigenstructure. These relationships are stated as formal theorems in what follows. The first defines useful properties of L as follows,

Theorem 9.1

(General Properties of \(L=(I+\varepsilon ^{-2}GG^*)^{-1}\)) With the notation defined above, the operator L in the NOILC Algorithm 9.1 satisfies the conditions

$$\begin{aligned} ker[L] = \{0\} \quad ~~ and \quad ~~0 < (1+\varepsilon ^{-2} \Vert G^*\Vert ^2)^{-1}I \le L \le I. \end{aligned}$$
(9.52)

In particular,

  1. 1.

    If there exists a scalar \(\varepsilon _0 >0\) such that \(GG^* \ge \varepsilon _0^2I\), then

    $$\begin{aligned} (1+\varepsilon ^{-2}\Vert G^*\Vert ^2)^{-1}I \le L \le (1+\varepsilon ^{-2}\varepsilon _0^2)^{-1}I~ < I. \end{aligned}$$
    (9.53)
  2. 2.

    If no such value of \(\varepsilon _0\) exists, then \(L < I\) if, and only if, \(~ker[G^*] = \{0\}\).

  3. 3.

    \(Le_0=e_0\) if, and only if, \(e_0 \in ker[G^*]\). That is, there exists a non-zero initial error \(e_0\) such that no improvement in tracking error is possible using NOILC if, and only if, \(~ker[G^*] \ne \{0\}\).

Proof

Using the NOILC interpretation, denote \(e_1 = Le_0\). If \(e_0 \ne 0\) and \(e_1 = Le_0=0\), then \(u_1 \ne u_0\) and \(J(u_1, u_0) = \langle e_0,Le_0 \rangle = \varepsilon ^2\Vert u_1 -u_0\Vert ^2>0\) which is not possible. Hence \(e_0=0\) and \(ker[L]=\{0\}\) as required. Next, note that \(\Vert e_1\Vert \le \Vert e_0\Vert \) for all \(e_0 \in \mathscr {Y}\). It follows that \(\Vert L\Vert \le 1\). That is, as L is self-adjoint, \(L \le \Vert L\Vert I \le I\) as required. Also, \(J(u_1,u_0) = \langle e_0, L e_0 \rangle > 0\) for all \(e_0 \ne 0\) and hence \(L > 0\). Next, let \(H=H^*\) be a self-adjoint, positive operator and denote the unique positive-definite, self-adjoint, square root of \((I+H)^{-1}\) by X. Then

$$\begin{aligned} (I+H)^{-1} - (I+\Vert H\Vert )^{-1}I&= (I+H)^{-1}(I+\Vert H\Vert )^{-1}\left( \Vert H\Vert I - H \right) \nonumber \\&= (I+\Vert H\Vert )^{-1}X\left( \Vert H\Vert I-H \right) X \ge 0 \end{aligned}$$
(9.54)

as \(\Vert H\Vert I - H \ge 0\). Choosing \(H=\varepsilon ^{-2}GG^*\) (respectively, \(H = \varepsilon ^{-2}G^*G\)) then gives

$$\begin{aligned} \begin{array}{rl} &{} (I+\varepsilon ^{-2}GG^*)^{-1} - (I+\varepsilon ^{-2}\Vert G^*\Vert ^2)^{-1}I \ge 0 \\ (resp. ~~~ &{} (I+\varepsilon ^{-2}G^*G)^{-1} - (I+\varepsilon ^{-2}\Vert G\Vert ^2)^{-1}I \ge 0) \end{array} \end{aligned}$$
(9.55)

as \(\Vert GG^*\Vert =\Vert G^*\Vert ^2\) and \(\Vert G^*G\Vert =\Vert G\Vert ^2\). Now, consider \(I-L\) in the form

$$\begin{aligned} \begin{array}{c} I-L = (I+\varepsilon ^{-2}GG^*)^{-1}\varepsilon ^{-2}GG^* = G(I+\varepsilon ^{-2}G^*G)^{-1}\varepsilon ^{-2}G^*, \quad and ~~ hence \\ \varepsilon ^{-2}\Vert G^*e_0\Vert ^2_{\mathscr {U}} \ge \langle e_0, (I-L)e_0 \rangle _{\mathscr {Y}} \ge (I+\varepsilon ^{-2}\Vert G\Vert ^2)^{-1} \varepsilon ^{-2}\Vert G^*e_0\Vert ^2_{\mathscr {U}}~ \end{array} \end{aligned}$$
(9.56)

which proves that \(Le_0=e_0\) if, and only if, \(e_0 \in ker[G^*]\) and also that \(L < I\) if \(ker[G^*]=\{0\}\).

Finally, suppose that \(GG^* \ge \varepsilon _0^2I\) and that \(e_0\ne 0\). Again writing \(H=GG^*\),

$$\begin{aligned} (I+\varepsilon ^{-2}H)^{-1} - (I+\varepsilon ^{-2}\varepsilon _0^2)^{-1}I&= (I+\varepsilon ^{-2}H)^{-1}(I+\varepsilon ^{-2}\varepsilon _0^2)^{-1}\varepsilon ^{-2}\left( \varepsilon _0^2I - H \right) \nonumber \\&\le 0 \end{aligned}$$
(9.57)

which completes the proof of the result.   \(\square \)

Discussion: The result has a number of useful consequences,

  1. 1.

    The property \(ker[L] = \{0\}\) indicates that, if \(e_0 \ne 0\), all following \(e_k \ne 0\).

  2. 2.

    The subspace \(ker[G^*]\) plays an important role in convergence properties. Monotonicity of the error norm sequence becomes strict monotonicity, \(\Vert e_{k+1}\Vert < \Vert e_k\Vert , ~~ k \ge 0~\) if \(ker[G^*] = \{0\}\). However, if \(ker[G^*] \ne \{0\}\), then there exists initial errors for which no change in error can be achieved.

  3. 3.

    As \(ker[G^*] = \mathscr {R}[G]^{\perp }\), the condition \(ker[G^*] = \{0\}\) implies that the range of G is dense in \(\mathscr {Y}\). More generally, using the fact that \(ker[G^*] = ker[GG^*] \),

    $$\begin{aligned} \mathscr {Y} = \overline{\mathscr {R}[G]} \oplus ker[G^*]~~~and~~~\overline{\mathscr {R}[GG^*]} = \overline{\mathscr {R}[G]}. \end{aligned}$$
    (9.58)

    from which it is concluded that \(\mathscr {R}[GG^*]\) is dense in \(\mathscr {R}[G]\).

  4. 4.

    The positivity properties of \(GG^*\) play an important role in the characterization of \(\Vert L\Vert \) and situations when \(L < I\). These properties, and the results in Sect. 5.2 are central to the important convergence properties discussed in the next theorem.

  5. 5.

    Using the Spectral Mapping Theorem gives the inclusion conditions

    $$\begin{aligned} if ~~ GG^* \ge \varepsilon _0^2 I > 0,~~ then, \quad spec[L]&\subset [(1 + \varepsilon ^{-2}\Vert G^*\Vert ^2)^{-1}, (1 + \varepsilon ^{-2}\varepsilon _0^2)^{-1}], \nonumber \\ and \quad spec[L]&\subset ~ [(1 + \varepsilon ^{-2}\Vert G^*\Vert ^2)^{-1}, 1] \quad otherwise. \nonumber \\ Also \quad spec[\alpha L]&= \alpha spec[L] \nonumber \\ and \quad spec[(\alpha - \beta )I+\beta L]&= (\alpha - \beta )+\beta spec[L]. \end{aligned}$$
    (9.59)

    These relations can be used to bound operator norms as the operators are self adjoint and hence the spectral radius is equal to the operator norm.

Turning now to the issues of convergence in NOILC algorithms,

Theorem 9.2

(Error Convergence in NOILC Algorithm 9.1) Application of the NOILC Algorithm 9.1 to the dynamics \(y=Gu+d\) with reference r and initial input \(u_0\) has the following error convergence properties

  1. 1.

    \(\Vert e_{k+1}\Vert _{\mathscr {Y}} \le \Vert e_k\Vert _{\mathscr {Y}}\) for all \(k \ge 0\) with strict inequality holding if \(ker[G^*] = \{0\}\).

  2. 2.

    If \(\Vert L\Vert < 1\), then the error sequence converges to zero as \(k \rightarrow \infty \).

  3. 3.

    If \(\Vert L\Vert =1\), then the error sequence converges to zero as \(k \rightarrow \infty \) if the initial error \(e_0 \in \overline{\mathscr {R}[I-L]} = \overline{\mathscr {R}[G]}\). If, in addition, \(ker[G^*] = \{0\}\), then \(L<I\) and convergence to zero is achieved for all \(e_0 \in \mathscr {Y}\).

  4. 4.

    In general, the error sequence converges to \(P_{ker[G^*]}e_0\) as \(k \rightarrow \infty \) where \(P_{ker[G^*]}\) is the self adjoint, positive operator defining the orthogonal projection of a vector onto the closed subspace \(ker[G^*] \subset \mathscr {Y}\).

Proof

The monotonicity properties have been discussed previously and need no further comment. If \(\Vert L\Vert < 1\), then convergence to zero follows from Theorem 5.4. Next, if \(\Vert L\Vert =1\), then Theorem 5.9 shows that \(e_k \rightarrow 0\) whenever \(e_0 \in \overline{\mathscr {R}[I-L]}\) which is just \(e_0 \in \overline{\mathscr {R}[GG^*]}\) as \(I-L = GG^*(I+GG^*)^{-1}\) and hence for any \(e_0 \in \overline{\mathscr {R}[G]}\) as \(\overline{\mathscr {R}[GG^*]} =\overline{\mathscr {R}[G]}\). If, in addition, \(ker[G^*] = \{0\}\), then Theorem 9.1 proves that \(I-L>0\). Theorem 5.10 then proves convergence to zero for all \(e_0 \in \mathscr {Y}\). Finally, the projection characterization of the limit follows from Theorem 5.9 using the identity \(ker[I-L] = ker[G^*]\).   \(\square \)

As in Theorem 7.2, the convergence of the input signal sequence \(\{u_k\}_{k \ge 0}\) requires an additional condition, namely that \(e_0 \in \mathscr {R}[I-L]=\mathscr {R}[GG^*]\). The following result has a similar structure and demonstrates that NOILC retains many of the desirable properties of steepest descent algorithms.

Theorem 9.3

(Input Convergence in NOILC Algorithm 9.1) With the notation of Theorem 9.2 suppose that \(e_0 \in \mathscr {R}[I-L]=\mathscr {R}[GG^*]\). Under these conditions, the sequence \(\{u_k\}_{k \ge 0}\) converges in norm to the unique input \(u_{\infty } \in \mathscr {U}\) satisfying the tracking relationship \(r=Gu_{\infty } +d\) whilst simultaneously minimizing the norm \(\Vert u - u_0\Vert ^2_{\mathscr {U}}\). That is,

$$\begin{aligned} u_{\infty } = \arg \min \{~ \Vert u-u_0\Vert ^2_{\mathscr {U}}~: r=Gu+d ~ \}. \end{aligned}$$
(9.60)

Proof

Convergence of the error to \(e_{\infty }=0\) follows from Theorem 9.2. Apply induction to the input update equation to show that

$$\begin{aligned} u_{k+1} = u_0 + \varepsilon ^{-2} G^*L\sum _{j=0}^k~L^je_0. \end{aligned}$$
(9.61)

As \(e_0 \in \mathscr {R}[I-L]\), write \(e_0 = (I-L)v_0\) for some \(v_0 \in \mathscr {Y}\) and, using the orthogonal subspace decomposition \(\mathscr {Y} = ker[I-L] \oplus \overline{\mathscr {R}[I-L]}\), take, without loss of generality \(v_0 \in \overline{\mathscr {R}[I-L]}\). Theorem 5.9 then proves that \(L^k v_0 \rightarrow 0\) as \(k \rightarrow \infty \) and

$$\begin{aligned} u_{k+1}&= u_0 + \varepsilon ^{-2} G^*L\left( \sum \nolimits _{j=0}^k~L^j \right) (I-L)v_0 ~=~ u_0 + \varepsilon ^{-2} G^*L(I -L^{k+1})v_0 \nonumber \\&\rightarrow u_{\infty } ~=~u_0 + \varepsilon ^{-2} G^*Lv_0 \quad \quad (as ~~ k \rightarrow \infty ) \end{aligned}$$
(9.62)

and, as \(e_{\infty }=0\), it follows that \(r=Gu_{\infty }+d\) as required. Finally, the equations define the minimum norm solution as, for any u in the closed linear variety \(S=\{~u~: ~~r=Gu + d \} \subset \mathscr {U}\), the inner product \(\langle u-u_{\infty }, u_{\infty }-u_0 \rangle _{\mathscr {U}} = \langle u-u_{\infty }, G^*Lv_0 \rangle _{\mathscr {U}} = \langle G(u - u_{\infty }), Lv_0 \rangle _{\mathscr {Y}} =0\) as \(G(u - u_{\infty })=0\). This is precisely the condition defining the orthogonal projection of \(u_0\) onto S which, from the Projection Theorem 2.17, defines the unique solution of the minimum norm problem.   \(\square \)

The inclusion of relaxation simplifies the result considerably but requires a characterization of the non-zero limit error,

Theorem 9.4

(Error Convergence in Relaxed NOILC: The Feedback Case) Application of the relaxed feedback NOILC Algorithm 9.2 with \( 0 \le \alpha < 1\) to the dynamics \(y=Gu+d\) with reference r and initial input \(u_0\) has the following convergence properties

$$\begin{aligned} \lim _{k \rightarrow \infty } e_k = e_{\infty } \quad where \quad e_{\infty } = L\left[ \alpha e_{\infty } + (1 - \alpha )(r-d)\right] . \end{aligned}$$
(9.63)

In particular, this equation has the unique solution

$$\begin{aligned} e_{\infty } = (I+(1 - \alpha )^{-1}\varepsilon ^{-2} GG^*)^{-1}(r-d), \end{aligned}$$
(9.64)

and hence, \(\lim _{k \rightarrow \infty } u_k = u_{\infty }\), which is the input signal that minimizes the quadratic performance index

$$\begin{aligned} J(u) = \Vert e\Vert ^2_{\mathscr {Y}} + (1-\alpha )\varepsilon ^2 \Vert u\Vert ^2_{\mathscr {U}} \quad with \quad e=r-y, \end{aligned}$$
(9.65)

subject to the constraints \(y = Gu+d\).

The result has an interesting performance and design interpretation,

  1. 1.

    The result ensures convergence to a limit that has a clear interpretation as the solution of a quadratic optimization problem where the input weighting is reduced by a factor of \(1 - \alpha \).

  2. 2.

    If \(\alpha \) is close to unity, the weighting of the input is very close to zero—a so-called “cheap” optimal control problem. In essence, this problem tries to minimize the error norm with only a small penalty on large deviations of the input signal from zero. As \(\varepsilon ^2 (1 -\alpha )\) increases, the penalty for using significant input magnitudes increases, leading to smaller control magnitudes and larger limit errors.

  3. 3.

    If the minimization of the objective function J(u) is set as the asymptotic objective of the iterations, the properties and magnitude of the limit error \(e_{\infty }\) is defined by the parameter \(\gamma = (1-\alpha )\varepsilon ^2\). The value of this product is chosen by the user and is likely to be small. It does not define \(\alpha \) or \(\varepsilon \) uniquely leaving room for design considerations. For example, if \(\varepsilon ^2 \ge \gamma \) is chosen to provide desirable convergence properties of the operator L, the value of \(\alpha \) (and hence the magnitude of the limit \(e_{\infty }\)) is computed from \(\alpha = 1 - \varepsilon ^{-2}\gamma \). For discrete or continuous state space systems, desirable convergence properties may include a need to avoid high feedback gains in the Riccati matrix.

Proof of Theorem 9.4 Using the error evolution in Algorithm 9.2, note that \(r(\alpha L)=\Vert \alpha L\Vert \le \alpha < 1\) and hence the algorithm converges to a limit (in the norm topology) as required. The value of \(e_{\infty }\) is obtained by replacing \(e_{k+1}\) and \(e_k\) by \(e_{\infty }\) which gives the required result. As L is invertible,

$$\begin{aligned} \begin{array}{rl} &{} (I + \varepsilon ^{-2} GG^*)e_{\infty } ~= ~ \alpha e_{\infty } + (1 - \alpha ) (r-d) \quad \quad \quad \quad \\ so ~~ that ~~\quad &{} ((1 - \alpha )I + \varepsilon ^{-2} GG^*)e_{\infty } = (1 - \alpha ) (r-d)\quad \quad \quad \quad \end{array} \end{aligned}$$
(9.66)

which provides the desired unique solution as \((1 - \alpha )I + \varepsilon ^{-2} GG^*\) is invertible. Finally, using the techniques of Theorem 2.18, the input minimizing J(u) is the solution of the equation \(\varepsilon ^2 (1 - \alpha ) u = G^*e\) so that \(\varepsilon ^2 (1 - \alpha ) Gu = GG^*e\). That is, \(\varepsilon ^2 (1 - \alpha ) (r-d-e) = GG^*e\) which gives the required solution.   \(\square \)

Finally, the feedforward version 9.3 of the relaxed algorithm has the following convergence properties.

Theorem 9.5

(Error Convergence in Relaxed NOILC: The Feedforward Case) Application of the relaxed feedforward NOILC Algorithm 9.3 with \(0 < \alpha < 1\) and \(0 < \beta \le 1\) to the dynamics \(y=Gu+d\) with reference r and initial input \(u_0\) has the following convergence properties

$$\begin{aligned} \lim _{k \rightarrow \infty } e_k = e_{\infty } \quad where \quad e_{\infty } = \left[ (\alpha - \beta )I + \beta L \right] e_{\infty } + (1 - \alpha )(r-d). \end{aligned}$$
(9.67)

In particular, if \(~\alpha =1\) and \(0 < \beta \le 1\), then the error sequence converges to the orthogonal projection of \(e_0\) onto the closed subspace \(ker[G^*]\).

Proof

The proof when \(\alpha < 1\) follows immediately as \((\alpha - \beta ) I \le (\alpha - \beta )I + \beta L = \alpha I - \beta (I-L) \le \alpha I\) and hence the spectral radius \(r\left( (\alpha - \beta )I + \beta L\right) < 1\). The details are left as an exercise for the reader. If \(\alpha = 1\) and \(0 < \beta < 1\), then convergence is guaranteed as,

$$\begin{aligned} \begin{array}{c} L_{\beta }=(1 - \beta )I + \beta L = I- \beta \varepsilon ^{-2}GG^*(I+\varepsilon ^{-2}GG^*)^{-1} \quad \\ with \quad \left( 1 - \frac{\beta \varepsilon ^{-2}\Vert G^*\Vert ^2}{1+\varepsilon ^{-2}\Vert G^*\Vert ^2}\right) I \le L_{\beta } \le I. \end{array} \end{aligned}$$
(9.68)

If \(\Vert L_{\beta }\Vert < 1\), the proof follows as \(\Vert e_{k+1}\Vert \le \Vert L_{\beta }\Vert \Vert e_k\Vert \) for all \(k \ge 0\) and hence \(e_{\infty }=0\). If \(\Vert L_{\beta }\Vert =1\), then Theorem 5.9 indicates that \(e_{\infty }=0\) if \(e_0 \in \overline{\mathscr {R}[I - L_{\beta }]} = \overline{\mathscr {R}[GG^*]}\). The orthogonal complement of this closed subspace is \(ker[I-L_{\beta }]=ker[GG^*] = ker[G^*]\) and it is a simple calculation to show that \(L_{\beta }v = v\) for all \(v \in ker[G^*]\). The result follows by writing \(e_0= e_0^{(1)} + e_0^{(2)}\) with \(e_0^{(1)} \in \overline{\mathscr {R}[I-L_{\beta }]}\) and \(e_0^{(2)} \in ker[G^*]\).   \(\square \)

9.2 Robustness of NOILC: Feedforward Implementation

The reader will note that, being monotonic, NOILC Algorithm 9.1 shares many properties with gradient-based and inverse-model-based algorithms. This section considers the feedforward implementation of NOILC and demonstrates that it also has robustness properties that are strongly related to those of gradient and inverse algorithms. The model of process dynamics is, again, \(y=Gu+d\) and the underlying feedforward input update equation used for off-line computation is

$$\begin{aligned} u_{k+1}=u_k + \varepsilon ^{-2}G^*(I+\varepsilon ^{-2}GG^*)^{-1}e_k = u_k + \varepsilon ^{-2}G^*Le_k. \end{aligned}$$
(9.69)

where \(e_k\) is the measured error signal on iteration k. The following analysis firstly considers computation of \(u_{k+1}\). Robustness analysis is separated into two cases, namely the case of left and the case of right multiplicative modelling errors. It is based on monotonicity of carefully selected quadratic forms and again introduces a need for positivity of multiplicative modelling errors. As was the case in inverse and gradient algorithms, the geometry of the input and output spaces is central to the analysis. For example, \(\mathscr {Y}\) has an orthogonal subspace decomposition of the form

$$\begin{aligned} \begin{array}{c} \mathscr {Y} = \overline{\mathscr {R}[G]} \oplus ker[G^*], ~~with ~~ ker[G^*] = ker[G^*L], \\ and~~~~\overline{\mathscr {R}[G^*G]}= \overline{\mathscr {R}[G^*LG]} = \overline{\mathscr {R}[G^*]}, \end{array} \end{aligned}$$
(9.70)

the second equality follows from \(G^*L = (I+\varepsilon ^{-2}G^*G)^{-1}G^*\) and the final equality from \(ker[G^*G] = ker[G]\) and \(G^*LG = G^*G(I+\varepsilon ^{-2}G^*G)^{-1}\).

9.2.1 Computational Aspects of Feedforward NOILC

\(e_k\) is the observed/measured tracking error on iteration k and not necessarily equal to the error predicted by the plant model. Updating of the input using the model can be achieved based on the observation that the input change \(u_{k+1} - u_k = \varepsilon ^{-2}G^*Le_k\) can be computed off-line as the input generated by the model in one iteration of NOILC from a starting condition of zero input using the value \(d=0\) and the “reference signal” equal to the measured error \(e_k\). For example,

  1. 1.

    for discrete state space Algorithm 9.4 for the model S(ABC), the update \(\varDelta u_{k+1} = u_{k+1}-u_k=\varepsilon ^{-2}G^*Le_k\) is computed from the equations used in feedback Implementation Two of Sect. 9.1.3 written in the form,

    $$\begin{aligned} z_{k+1}(t+1)&= A z_{k+1}(t)+B\varDelta u_{k+1}(t), \quad \quad \underbrace{z_{k+1}(0)=0}, \nonumber \\ \varDelta u_{k+1}(t)&= R^{-1}(t)B^T\left( - K(t)z_{k+1}(t) + \xi _{k+1}(t) \right) ,~ \nonumber \\ \xi _{k+1}(t)&= \left( I+ \varepsilon ^{-2}\tilde{K}(t+1)BR^{-1}(t)B^T \right) ^{-1} \psi _{k+1}(t), \quad ~~~ with, \nonumber \\ \psi _{k+1}(t)&= A^T\xi _{k+1}(t+1) + C^TQ(t+1) \underbrace{e_k(t+1)},\quad ~~~\xi _{k+1}(N)=0. \end{aligned}$$
    (9.71)

    where K(t) remains unchanged and is that defined by model data ABCD plus Q(t), R(t) and \(\varepsilon ^2\).

  2. 2.

    For continuous state space Algorithm 9.5 for the model S(ABC), the update \(\varDelta u_{k+1} = u_{k+1}-u_k=\varepsilon ^{-2}G^*Le_k\) is computed from the equations used in feedback Implementation Two of Sect. 9.1.6 written in the form,

    $$\begin{aligned} \dot{z}_{k+1}(t)&= A z_{k+1}(t)+B\varDelta u_{k+1}(t), \quad \quad \underbrace{z_{k+1}(0)=0}, \nonumber \\ \varDelta u_{k+1}(t)&= R^{-1}(t)B^T\left( - K(t)z_{k+1}(t) + \xi _{k+1}(t) \right) ,~ \nonumber \\ \dot{\xi }_{k+1}(t)&= \left( A^T - \varepsilon ^{-2} K(t)BR^{-1}(t)B^T\right) \xi _{k+1}(t) - C^TQ(t)\underbrace{e_k(t)}. \end{aligned}$$
    (9.72)

    where K(t) remains unchanged and is that defined by model data.

9.2.2 The Case of Right Multiplicative Modelling Errors

Now suppose that G is a model of the actual plant and that the plant is represented by the operator GU where \(U: \mathscr {U} \rightarrow \mathscr {U}\) represents a right multiplicative modelling error. The case when \(U=I\) is that when the model is capable of predicting plant behaviour exactly. Using the plant input/output relationship \(y=GUu+d_U\) then gives the error update equation

$$\begin{aligned} e_{k+1} = L_U e_k, \quad for ~~ all ~~k \ge 0, \quad where \quad L_U = \left( I - \varepsilon ^{-2} GUG^*L\right) \end{aligned}$$
(9.73)

noting that, in general, \(L_U \ne L_U^*~\) and that \( L_Ue=e~\) if \(e \in ker[G^*]\). Note, in particular, that any component of \(e_0 \in ker[G^*L]=ker[G^*]\) is unchanged by the iterative process. The only significant evolution occurs in \(\overline{\mathscr {R}[G]}\), which is both L-invariant and \(L_U\)-invariant and is a Hilbert space in its own right. This can be expressed more clearly by writing \(e_0 = e_0^{(1)} + e_0^{(2)}\) with \(e_0^{(1)} \in \overline{\mathscr {R}[G]}\) and \(e_0^{(2)} \in ker[G^*]\). A simple calculation gives,

$$\begin{aligned} e_k = L_U^ke_0 = L_U^ke_0^{(1)} + e_0^{(2)}~~and ~~ \Vert e_k\Vert \ge \Vert e_0^{(2)}\Vert ,~~ \quad for ~~ all ~~k \ge 0. \end{aligned}$$
(9.74)

Convergence is hence described entirely by the properties of the restriction of \(L_U\) to the closed subspace \(\overline{\mathscr {R}[G]}\). The best that can be achieved is that \(\lim _{k \rightarrow \infty } e_k = e_0^{(2)}\).

Convergence in the presence of the modelling error U is defined by the construction of a topologically equivalent norm in \(\mathscr {Y}\) (and hence \(\overline{\mathscr {R}[G]}\)). Noting that \(L=L^* > 0\), Theorem 9.1 then indicates that the inner product

$$\begin{aligned} \langle e,w \rangle _0 = \langle e,Lw \rangle _{\mathscr {Y}} \quad generates~~ a ~~ norm \quad \Vert e\Vert _0=\langle e,Le\rangle ^{1/2}_{\mathscr {Y}} \end{aligned}$$
(9.75)

that is topologically equivalent to \(\Vert e\Vert _{\mathscr {Y}}\).

The approach to robustness analysis is based on a consideration of the possibility of ensuring the monotonicity property \(\Vert e_{k+1}\Vert _0 < \Vert e_k\Vert _0\) for all \(k \ge 0\). That is ,

$$\begin{aligned} (Robust~~Monotonicity) ~~ \quad \langle e_{k+1},Le_{k+1} \rangle _{\mathscr {Y}} < \langle e_k,Le_k \rangle _{\mathscr {Y}}, \quad for~~all ~~ k \ge 0, \end{aligned}$$
(9.76)

which, using the error evolution equation, can be written in the form

$$\begin{aligned} \begin{array}{c} \Vert e_{k+1}\Vert _0^2 = \Vert e_k\Vert _0^2 + \varepsilon ^{-2} \langle \eta _k, \left[ \varepsilon ^{-2}U^*G^*LGU - \left( U+U^*\right) \right] \eta _k \rangle _{\mathscr {U}}~\\ where \quad \quad \eta _k = G^*Le_k \in \mathscr {R}[G^*] \subset \mathscr {U}. \end{array} \end{aligned}$$
(9.77)

Note that \(\eta _k = 0\) if, and only if, \(e_k \in ker[G^*L]=ker[G^*]\) i.e. \(e_k^{(1)} =0\). Also, as L has a bounded inverse, \(\mathscr {R}[G^*L]= \mathscr {R}[G^*]\) and \(\eta _k\) can take any value in \(\mathscr {R}[G^*]\).

The following Theorem describes robust monotonic convergence in the new norm topology.

Theorem 9.6

(Robustness of NOILC with Right Multiplicative Errors) Consider NOILC Algorithm 9.1 in its feedforward implementation using the measured data \(e_k\). Using the notation defined above, a necessary and sufficient condition for the norm sequence \(\{\Vert e_k\Vert _0\}_{k \ge 0}\) to satisfy the monotonicity condition \(\Vert e_{k+1}\Vert _0 < \Vert e_k\Vert _0\), for all \(k \ge 0\), and for all \(e_0\) with \(e_0^{(1)}\ne 0\), is that

$$\begin{aligned} Condition~~One - U+U^*&> \varepsilon ^{-2}U^*G^*LGU,&~~on ~~ \mathscr {R}[G^*]. \end{aligned}$$
(9.78)

In these circumstances,

  1. 1.

    The modelling error satisfies the condition \(\quad ker[U] \cap \mathscr {R}[G^*]=\{0\}\).

  2. 2.

    For any initial error \(e_0\), the norm sequence \(\{\Vert e_k\Vert _0\}_{k \ge 0}\) is bounded and the limit \(\lim _{k \rightarrow \infty } \Vert e_k\Vert _0 = E_0 \ge 0\) exists. The limit \(e_{\infty } = \lim _{k \rightarrow \infty } e_k = 0\) is therefore achieved if, and only if, \(E_0=0\).

  3. 3.

    A sufficient condition for Condition One to be satisfied is that the operator inequality holds either on the closure \(\overline{\mathscr {R}[G^*]}\) or on the full input space \(\mathscr {U}\). If \(ker[G]=\{0\}\), then, as \(\overline{\mathscr {R}[G^*]}=\mathscr {U}\), these two alternatives are identical.

Finally, in the new topology,

  1. 1.

    the induced norm of \(L_U\) in \(\mathscr {Y}\) satisfies \(\Vert L_U\Vert _0 \le 1\) and, more specifically,

  2. 2.

    the induced norm of the restriction of \(L_U\) to \(\overline{\mathscr {R}[G]}\) in the new topology also satisfies

    $$\begin{aligned} \Vert L_U\Vert _0 \le 1. \end{aligned}$$
    (9.79)

    In particular, if \(\mathscr {R}[G]\) is finite dimensional, then this norm is strictly less than unity and,

    $$\begin{aligned} for ~~ all~~starting ~~ conditions \quad e_0 \in \mathscr {Y},\quad \quad \lim _{k \rightarrow \infty } e_k = e_0^{(2)} \in ker[G^*]. \end{aligned}$$
    (9.80)

Note: If \(ker[G^*]=\{0\}\), \(\mathscr {R}[G]\) is dense in \(\mathscr {Y}\) and monotonicity occurs for all \(e_0 \in \mathscr {Y}\).

Proof

The first condition follows by noting that \(U\,+\,U^*>0\) on \(\mathscr {R}[G^*]\) by assumption as \(U^*G^*LGU \ge 0\). The assumed inequality would then be violated if there existed a non-zero \(u \in \mathscr {R}[G^*]\) satisfying \(Uu = 0\). For the remainder of the proof, it is only necessary to consider the case when \(k=0\). The necessity of the conditions for monotonicity follows as, if \(e_0^{(1)}\ne 0\), then \(\eta _0 \ne 0\) and, by suitable choice of \(e_0^{(1)}\), \(\eta _0\) can take any value in \(\mathscr {R}[G^*]\). Condition One then follows from Eq. (9.77) which then requires that the second term is strictly negative for all non-zero \(\eta _0 \in \mathscr {R}[G^*]\). Sufficiency follows easily from Eq. (9.77). The existence of \(E_0 \ge 0\) follows from positivity of norms and monotonicity whilst the comment on the case when \(E_0=0\) is a consequence of the definition of convergence in norm to zero. Next the inequality holds on \(\mathscr {R}[G^*]\) if it holds on any subspace that contains it. The proofs that \(\Vert L_U\Vert _0 \le 1\) are a direct consequence of the monotonicity of the norms on \(\mathscr {Y}\) and hence on any \(L_U\)-invariant subspace. Finally, if \(\mathscr {R}[G]\) is finite dimensional, then \(\mathscr {R}[G]=\overline{\mathscr {R}[G]}\). Choose a non-zero \(e_0 \in \overline{\mathscr {R}[G]}\) arbitrarily. Then \(\eta _0 \ne 0\) and the resultant inequality \(\Vert e_{1}\Vert _0 < \Vert e_0\Vert _0\) plus the compactness of the unit sphere leads to the strict bound \(\Vert L_U\Vert _0 < 1\) on the norm of the restriction of \(L_U\) to \(\overline{\mathscr {R}[G]}\) for, otherwise, there would exist a non-zero \(e \in \overline{\mathscr {R}[G]}\) such that \(\Vert L_Ue\Vert _0 = \Vert e\Vert _0\). This contradicts the proven monotonicity. The proof of convergence to \(e_0^{(2)}\) follows.   \(\square \)

In principle, this result provides a good description of robustness in the presence of the modelling error. The practical problems implicit in checking Condition One lie in the presence of the operator L, which depends on \(GG^*\) in a “nonlinear” way. The following result simplifies the condition by providing two simpler, but more conservative, alternative sufficient conditions.

Theorem 9.7

(Robust Convergence and Boundedness: Alternative Conditions) Using the assumptions of Theorem 9.6, its conclusions remain valid if (a sufficient condition) Condition One is replaced by either

$$\begin{aligned} Condition~~Two - U+U^*&> \frac{\varepsilon ^{-2}\Vert G\Vert ^2}{1 + \varepsilon ^{-2}\Vert G\Vert ^2}U^*U,&~~on ~~ \mathscr {R}[G^*], \nonumber \\ or,~ \quad Condition~~Three - U+U^*&> \varepsilon ^{-2}U^*G^*GU,&~~on ~~ \mathscr {R}[G^*]. \end{aligned}$$
(9.81)

A sufficient condition for each Condition to be satisfied is that the associated operator inequality holds either on \(\overline{\mathscr {R}[G^*]}\) or on the full space \(\mathscr {U}\).

Proof

The sufficiency of Condition Two follows from Condition One and the identity \(\varepsilon ^{-2}G^*LG = (I+\varepsilon ^{-2}G^*G)^{-1}\varepsilon ^{-2}G^*G = I - (I+\varepsilon ^{-2}G^*G)^{-1} \le \left( \frac{\varepsilon ^{-2}\Vert G\Vert ^2}{1+\varepsilon ^{-2}\Vert G\Vert ^2}\right) I\) from Theorem 9.1 (with G replaced by \(G^*\)). Condition Three also follows from Theorem 9.1 as \(L \le I\).   \(\square \)

Note: Of the three given, Condition One is clearly the least conservative, Condition Two adds conservatism by replacing \(G^*LG\) by a constant upper bound whilst Condition Three replaces L by a constant upper bound and leaves the factor \(G^*G\) in place. The replacement of \(\mathscr {R}[G^*]\) by \(\overline{\mathscr {R}[G^*]}\) or \(\mathscr {U}\) also adds conservatism but may make the checking of the condition easier in practice. A useful example of this relaxation also allows Conditions Two and Three to be combined into one, parameterized condition by writing \(L=\theta L + (1-\theta )L\) and noting that, for all \(\theta \in [0,1]\),

$$\begin{aligned} \varepsilon ^{-2}G^*LG \le \theta \frac{\varepsilon ^{-2}\Vert G\Vert ^2}{1 + \varepsilon ^{-2}\Vert G\Vert ^2}I + (1 - \theta )\varepsilon ^{-2} G^*G \end{aligned}$$
(9.82)

in the \(\mathscr {Y}\) topology. Applications of this bound include,

Theorem 9.8

(Robust Convergence and Boundedness: Conditions Combined) Using the notation of the discussion above, suppose that U has a bounded inverse \(\hat{U}\) on \(\mathscr {U}\) and that, for some \(\theta \in [0,1]\),

$$\begin{aligned} \begin{array}{c} Condition~~Four - \hat{U} + \hat{U}^* > \theta \beta _{I}I + (1 - \theta )\beta _{G}G^*G \quad on ~~ \mathscr {U} \\ where \quad \beta _{I} = \frac{\varepsilon ^{-2}\Vert G\Vert ^2}{1 + \varepsilon ^{-2}\Vert G\Vert ^2} \quad and \quad \beta _{G} = \varepsilon ^{-2}. \end{array} \end{aligned}$$
(9.83)

Then Condition One holds and the monotonicity and convergence predictions of Theorem 9.6 are guaranteed.

Proof

Consider Condition one with \(\mathscr {R}[G^*]\) replaced by \(\mathscr {U}\) and write it in the form \(\hat{U} + \hat{U}^* > G^*LG \) on \(\mathscr {U}\). Using the mixed bound for L then gives the result.   \(\square \)

Some insight into the implications of these NOILC robustness conditions is obtained by considering the case of no modelling error when \(U=I\) and linking the results to conditions for robustness of the inverse model and gradient algorithms in Chaps. 6 and 7. More precisely,

  1. 1.

    Condition Two is always satisfied when \(U=I\) as it then reduces to \(2I > \beta _I I\) as the “gain” parameter \(\beta _I = \frac{\varepsilon ^{-2}\Vert G\Vert ^2}{1+\varepsilon ^{-2}\Vert G\Vert ^2} < 1\).

  2. 2.

    More generally, Condition Two is algebraically identical to the robustness condition for the left inverse model algorithm with right multiplicative perturbation U. In particular, it is satisfied if,

    $$\begin{aligned} \Vert (I -\beta _I U)\Vert ^2 = r\left( (I-\beta _I U)^*(I-\beta _I U) \right) < 1. \end{aligned}$$
    (9.84)
  3. 3.

    If \(U=I\), then Condition Three reduces to \(2I > \varepsilon ^{-2}G^*G\) which is satisfied if \(\varepsilon ^{2} > \frac{1}{2}\Vert G\Vert ^2\). That is, the weighting on the control component of the NOILC objective function is limited by this condition. The eigenstructure ideas of Sect. 9.1.7, then imply that the achievable spectral bandwidth is limited as is the achievable convergence rate.

  4. 4.

    More generally, Condition Three is identical to that derived for the gradient algorithm with right multiplicative perturbation U. In particular, it is satisfied if,

    $$\begin{aligned} r\left( (I-\beta _G GUG^*)^*(I-\beta _G GUG^*) \right) < 1. \end{aligned}$$
    (9.85)
  5. 5.

    The mixed condition of Theorem 9.8 provides a continuous link between the inverse and gradient conditions and will be seen to have possible value in the later development of frequency domain robustness criteria.

Finally, the nature of the convergence can, in general, be refined by considering the detailed form of U and G in the application of the results. The monotonicity of the error sequence measured by the \(\Vert \cdot \Vert _0\) norm implies that the induced operator norm \(\Vert L_U\Vert _0 \le 1\) in the topology induced by the inner product \(\langle e,w \rangle _0 = \langle e,Lw \rangle _{\mathscr {Y}}\). In particular, \(\Vert L_U\Vert _0 = 1\) if \( ker[G^*]\ne \{0\}\). More generally, its restriction to \(\overline{\mathscr {R}[G]}\) can have unit norm even if \(ker[G^*]=\{0\}\), a property that occurs only in the infinite dimensional case and leads to technical problems beyond the chosen scope of this text. The result has shown that \(\Vert L_U\Vert _0 < 1\) on \(\overline{\mathscr {R}[G]}\) if it is finite dimensional. The following result describes a case that is particularly relevant to finite or (a class of) infinite dimensional problems where \(ker[G]=\{0\}\).

Theorem 9.9

(Robust Convergence to \(e_0^{(2)} \in ker[G^*]\)) Suppose that the plant GU has model G and right multiplicative modelling error U. Suppose also that that there exists a real number \(\varepsilon ^2_0 > 0\) such that

$$\begin{aligned} G^*G \ge \varepsilon _0^2 I \quad \quad in \quad \mathscr {R}[G^*]. \end{aligned}$$
(9.86)

Suppose also that either

$$\begin{aligned} Condition ~Two~- ~(A) \quad U + U^*&\ge \frac{\varepsilon ^{-2}\Vert G\Vert ^2}{1+\varepsilon ^{-2}\Vert G\Vert ^2} U^*U +\,\varepsilon _0^2 I&\quad in \quad \mathscr {R}[G^*], \nonumber \\ or \quad Condition ~Three~- ~(B) \quad U + U^*&\ge \varepsilon ^{-2}U^*G^*GU +\,\varepsilon _0^2 I&\quad in \quad \mathscr {R}[G^*], \end{aligned}$$
(9.87)

or, U has a bounded inverse on \(\mathscr {U}\), \(\theta \in [0,1]\) and

$$\begin{aligned} Condition ~~ Four ~~-~~ (C) \quad \hat{U} + \hat{U}^* > \theta \beta _{I}I + (1 - \theta )\beta _{G}G^*G + \varepsilon _0^2 I ~~on ~~ \mathscr {U} \end{aligned}$$
(9.88)

holds, then \(\Vert L_U\Vert _0 < 1\) on \(\overline{\mathscr {R}[G]}\) and the error sequence converges to the component \(e_0^{(2)}\in ker[G^*]\).

Proof

Note that a proof that \(\Vert L_U\Vert _0 < 1\) on \(\overline{\mathscr {R}[G]}\) proves the convergence to \(e_0^{(2)}\). Next consider the use of (A). From (9.77) with \(\eta _k = G^*Le_k\) and \(e_k\) arbitrary,

$$\begin{aligned} \Vert e_{k+1}\Vert _0^2&= \Vert e_k\Vert _0^2 + \varepsilon ^{-2} \langle \eta _k, \left[ \varepsilon ^{-2}U^*G^*LGU - \left( U+U^*\right) \right] \eta _k \rangle _{\mathscr {U}}~ \nonumber \\&\le \Vert e_k\Vert _0^2 + \varepsilon ^{-2} \langle \eta _k, \left[ \frac{\varepsilon ^{-2}\Vert G\Vert ^2}{1+\varepsilon ^{-2}\Vert G\Vert ^2}U^*U - \left( U+U^*\right) \right] \eta _k \rangle _{\mathscr {U}}~\nonumber \\&\le \Vert e_k\Vert _0^2 - \varepsilon ^{-2} \varepsilon _0^2\Vert \eta _k\Vert ^2_{\mathscr {U}}. \end{aligned}$$
(9.89)

Now set \(e_k = Gw_k \in \mathscr {R}[G]\) and note that, using Theorem 9.1 with G replaced by \(G^*\), and the inequality \(\Vert e\Vert \le \Vert G\Vert \Vert w\Vert \), it follows that

$$\begin{aligned} \Vert \eta _k\Vert ^2_{\mathscr {U}}&= \Vert G^*LGw_k\Vert ^2_{\mathscr {U}} = \langle w_k,G^*LGG^*LG w_k \rangle _{\mathscr {U}} \nonumber \\&\ge \left( \frac{\varepsilon _0^2}{1+\varepsilon ^{-2}\Vert G\Vert ^2}\right) ^2 \Vert w_k\Vert ^2_{\mathscr {U}} \ge \left( \frac{\Vert G\Vert ^{-1}\varepsilon _0^2}{1+\varepsilon ^{-2}\Vert G\Vert ^2}\right) ^2 \Vert e_k\Vert ^2_{\mathscr {Y}}. \end{aligned}$$
(9.90)

Using this relation and the topological equivalence of the norms implies that, for any \(e_k \in \mathscr {R}[G]\), the inequality \(\Vert e_{k+1}\Vert _0^2 \le ( 1-\lambda )\Vert e_k\Vert _0^2\) holds for some \(\lambda > 0\). The relation therefore is also valid in the closure \(\overline{\mathscr {R}[G]}\). The required inequality \(\Vert L_U\Vert _0 < 1\) then follows. Finally, for (B), note that \(L \le I\) and write

$$\begin{aligned} \Vert e_{k+1}\Vert _0^2&= \Vert e_k\Vert _0^2 + \varepsilon ^{-2} \langle \eta _k, \left[ \varepsilon ^{-2}U^*G^*LGU - \left( U+U^*\right) \right] \eta _k \rangle _{\mathscr {U}}~\nonumber \\&\le \Vert e_k\Vert _0^2 + \varepsilon ^{-2} \langle \eta _k, \left[ \varepsilon ^{-2}U^*G^*GU - \left( U+U^*\right) \right] \eta _k \rangle _{\mathscr {U}}~\nonumber \\&\le \Vert e_k\Vert _0^2 - \varepsilon ^{-2} \varepsilon _0^2\Vert \eta _k\Vert ^2_{\mathscr {U}}. \end{aligned}$$
(9.91)

The proof is then concluded using a similar argument to that used for (A). (C) is proved in a similar way noting that

$$\begin{aligned} \Vert e_{k+1}\Vert _0^2 \le \Vert e_k\Vert _0^2 -\varepsilon ^{-2}\varepsilon _0^2 \Vert U\eta _k\Vert ^2_{\mathscr {U}} \end{aligned}$$
(9.92)

and, using invertibility of U, the existence of a real number \(\alpha >0\) such that \(U^*U \ge \alpha I\). The details are left as an exercise for the reader.   \(\square \)

The robustness results are reassuring in that they indicate substantial robustness of the algorithms whenever the defined positivity conditions are satisfied. To be useful in practice, the conditions must be converted into checkable conditions, preferably written in terms of quantities familiar to practicing engineers. One such example is now described.

9.2.3 Discrete State Space Systems with Right Multiplicative Errors

Consider the case when G can be represented by a discrete time, linear, time invariant, state space model S(ABCD) (or its equivalent supervector description) on the interval \(0 \le t \le N\) and that it has the transfer function matrix G(z). Consider Algorithm 9.4 where the weights Q and R are taken to be independent of sample number “t”. The actual plant model is assumed to be expressed in the form GU with right multiplicative modelling error \(U: \mathscr {U} \rightarrow \mathscr {U}\) which has its own state space model \(S(A_U,B_U,C_U,D_U)\) and \(\ell \times \ell \) transfer function matrix U(z). In this case, \(\mathscr {Y}\) and \(\mathscr {U}\) are finite dimensional and hence every vector subspace is closed.

In what follows, Conditions Two and Three of Theorem 9.7 are converted into frequency domain tests for robust convergence of the feedforward implementation.

Theorem 9.10

(Checking Condition Two for Discrete, State Space Systems) Let the model G, the actual plant GU and the NOILC objective function be as described above. Then a sufficient condition for Condition Two of  Theorem 9.7 to hold is that U(z) is asymptotically stable and also that

$$\begin{aligned} RU(z) + U^T(z^{-1})R > \beta _{I} U^T(z^{-1})RU(z),~ for ~~ all ~~ |z|=1, \end{aligned}$$
(9.93)

where the “gain” parameter

$$\begin{aligned} \beta _{I} = \frac{\varepsilon ^{-2}\Vert G\Vert ^2}{ 1+ \varepsilon ^{-2}\Vert G\Vert ^2}. \end{aligned}$$
(9.94)

In these circumstances, the Feedforward NOILC algorithm converges to the component \(e_0^{(2)}\) of \(e_0\) in \(ker[G^*]\),  this convergence being monotonic in the norm \(\Vert e \Vert _0 = \sqrt{\langle e,Le \rangle }\). Finally, for computational purposes, \(\Vert G\Vert ^2\) can be replaced in the formula for \(\beta _{I}\) by its upper bound

$$\begin{aligned} \Vert G\Vert ^2 \le \sup _{|z|=1}~r\left( R^{-1}G^T(z^{-1})QG(z) \right) , \end{aligned}$$
(9.95)

a relationship that links the frequency domain condition to the weighting matrices Q and R.

Proof

First note that continuity and compactness of the unit sphere implies that there must exist a real number \(\underline{g}>0\) such that \(RU(z) + U^T(z^{-1})R \ge \underline{g}R\) whenever \(|z|=1\) and hence U(z) is strictly positive real in the sense defined by Theorem 4.7. The same theorem can then be used, with a change in notation, to prove the operator inequality \(U+U^* \ge \underline{g}I >0\) on \(\mathscr {U}\) and hence that \(ker[U]=\{0\}\). The proof that \(U + U^* > \beta U^*U\) on \(\mathscr {U}\) using the frequency domain condition follows by applying Theorem 4.9 with a change in notation, replacing both G(z) and K(z) by U(z) and noting that \(\beta ^* > \beta _I\) and \(Uu \ne 0\) for all non-zero \(u \in \mathscr {U}\). Using the finite dimensional nature of \(\mathscr {U}\) and \(\mathscr {Y}\) then indicates that \(L_U\) has norm strictly less than unity on the closed subspace \(\mathscr {R}[G]\) which proves the convergence statement. Finally, as \(\frac{a}{1+a}\) is monotonically increasing in a, \(\beta \) can be replaced by the value obtained when \(\Vert G\Vert \) is substituted by any upper bound. The bound stated in the theorem was derived in Chap. 4 in Theorem 4.5.   \(\square \)

Theorem 9.11

(Checking Condition Three for Discrete State Space Systems) A sufficient condition for Condition Three of  Theorem 9.7 to hold is that both G(z) and U(z) are asymptotically stable and also that

$$\begin{aligned} RU(z) + U^T(z^{-1})R > \beta _{G} U^T(z^{-1})G^T(z^{-1})QG(z)U(z),~ for ~~ all ~~ |z|=1, \end{aligned}$$
(9.96)

where the scalar \(\beta _{G} = \varepsilon ^{-2}\). The Feedforward NOILC algorithm then converges to the component \(e_0^{(2)}\) of \(e_0\) in \(ker[G^*]\). Convergence is monotonic in the norm \(\Vert e \Vert _0\).

Proof

As in the proof of the preceding result, \(U+U^* \ge \underline{g}I\) for some \(\underline{g}>0\). The same theorem can then be used, with a change in notation, to prove the operator inequality \(U+U^* \ge \underline{g}I > 0\) on \(\mathscr {U}\) and hence that \(ker[U]=\{0\}\). The proof that \(U + U^* > \beta U^*G^*GU\) on \(\mathscr {R}[G^*]\) using the frequency domain condition then follows by applying Theorem 4.9 with a change in notation, replacing G(z) by G(z)U(z) and K(z) by U(z), noting that \(\beta ^* > \beta _G\) and that \(GUu \ne 0\) for all non-zero \(u \in \mathscr {R}[G^*]\). This last statement is proved by writing any non-zero \(u \in \mathscr {R}[G^*]\) as \(u=G^*w\), from which the equality \(2\langle w,GUu \rangle _{\mathscr {Y}}=2\langle G^*w,Uu \rangle _{\mathscr {U}}=\langle u,(U+U^*)u \rangle _{\mathscr {U}} \ge \underline{g} \Vert u\Vert ^2_{\mathscr {U}}>0\). Using the finite dimensional nature of \(\mathscr {U}\) and \(\mathscr {Y}\) then indicates that \(L_U\) has norm strictly less than unity on \(\mathscr {R}[G]\) which proves the convergence statement.   \(\square \)

The following result is proved in a similar way,

Theorem 9.12

(Condition Four in Discrete State Space System Applications) A sufficient condition for Condition Four of  Theorem 9.7 to hold is that U(z) is both invertible and minimum-phase, that G(z) is asymptotically stable and also that, for some \(\theta \in [0,1]\),

$$\begin{aligned} R\hat{U}(z) + \hat{U}^T(z^{-1})R > \theta \beta _{I}R + \beta _{G} (1 - \theta )G^T(z^{-1})QG(z),~~~ for ~~ all ~~ |z|=1. \end{aligned}$$
(9.97)

The Feedforward NOILC algorithm then converges to the component \(e_0^{(2)}\) of \(e_0\) in \(ker[G^*]\), this convergence being monotonic in the norm \(\Vert e \Vert _0\).

The results provide considerable insight into robustness of the NOILC procedure and some reassurance that robustness is an inherent property of the algorithmic process. Some areas of uncertainty do remain, as reflected by the observations that,

  1. 1.

    as presented, the conditions require the existence of a strictly positive real error U(z). When \(\ell > m\) such a description requires the definition and the use of redundant components in the kernel of G(z).

  2. 2.

    The assumption of asymptotic stability of U(z) suggests that the non-minimum-phase zeros of G must be those of GU.

  3. 3.

    The conditions require that \(ker[U]=\{0\}\) and hence that \(D_U\) is nonsingular. The single-input, single-output case leads to the interpretation that, in effect, the modelling error cannot change the relative degree of the model.

Theorems 9.10 and 9.11 provide frequency domain checks for the use of the model G in the presence of U but also link the robustness to the chosen value of \(\varepsilon ^2\) and the choice of Q and R (which appear explicitly and also, implicitly, in the value of \(\Vert G\Vert \)). These relationships are complex, but they can be simplified for the single-input, single-output case (\(m=\ell =1\)) to the conditions, respectively,

$$\begin{aligned} \left| \beta ^{-1} - U(z)\right| < \beta ^{-1} \quad \quad with ~~ ``gain\text {''} \quad \beta = \frac{\varepsilon ^{-2}\Vert G\Vert ^2}{1+\varepsilon ^{-2}\Vert G\Vert ^2}, \nonumber \\ \quad \quad \quad \quad \quad \quad \quad \quad \quad and\quad \quad \Vert G\Vert ^2 = R^{-1}Q \Vert G(z)\Vert ^2_{\infty } \nonumber \\ or, \quad \left| \beta ^{-1} - \left| G(z)\right| ^2U(z)\right| < \beta ^{-1} \quad \quad with ~~ ``gain\text {''} \quad \beta = \varepsilon ^{-2}QR^{-1}, \end{aligned}$$
(9.98)

for all z satisfying \(|z|=1\). These are precisely the robustness conditions for inverse and steepest descent algorithms in Chaps. 6 and 7 and provide a link between these algorithms and NOILC. Both have graphical interpretations and underline both the need for U(z) to be strictly positive real and, also, the benefits (to robustness) of using smaller values of \(\beta \). Within this constraint, note that the different criteria provide different characterizations of the range of U(z) that can be tolerated.

  1. 1.

    In both cases, the modelling error can vary considerably, particularly if \(\beta \) is small or, equivalently, \(\varepsilon ^2\) is sufficiently large.

  2. 2.

    Although the second condition does not permit small values of \(\varepsilon ^2\), it may be of more practical use when G(z) is low pass but U(z) has high frequency resonances of large magnitude. The first condition will limit the value of gain \(\beta \) in such circumstances but, as the product \(|G(z)|^2U(z)\) attenuates the high frequency values of U, the second test may permit larger gain values.

  3. 3.

    The mixed robustness criterion of Theorem 9.12 represents a compromise solution where the frequency domain content of the right hand side of the test can be manipulated, in a limited way, by a choice of \(\theta \in [0,1]\). Conceptually, varying \(\theta \) changes the predicted permissible range of \(\varepsilon ^2\) that can be used. The natural choice is the value which maximizes this range.

9.2.4 The Case of Left Multiplicative Modelling Errors

This case is described by the use of a model G in the situation where the actual plant is described by UG where the left multiplicative modelling error operator \(U: \mathscr {Y} \rightarrow \mathscr {Y}\) is linear and bounded. As with right multiplicative modelling errors, the error evolution in the feedforward implementation is described by a recursion

$$\begin{aligned} \begin{array}{rcl} e_{k+1}= L_Ue_k &{} ~~ where ~~ &{} L_U = I - \varepsilon ^{-2} UGG^*L = I - U(\!I-L\!), \\ with \quad L=(I+\varepsilon ^{-2}GG^*)^{-1} &{} ~~ and ~~ &{} I-L = \varepsilon ^{-2}G(I + \varepsilon ^{-2}G^*G)^{-1}G^*. \end{array} \end{aligned}$$
(9.99)

A simple calculation shows that errors in \(ker[G^*]\) are unchanged by the iterative process and that error changes occur only in the subspace \(\mathscr {R}[UG]\). The difference between left and right multiplicative modelling errors therefore is that the subspace where dynamics is concentrated depends on the modelling error itself. The technical issue that arises from this fact can be expressed in terms of the characterization of all signals into components in \(ker[G^*]\) and \(\mathscr {R}[UG]\) or its closure \(\overline{\mathscr {R}[UG]}\).

Theorem 9.13

(Properties of U, \(ker[G^*]\), \(\mathscr {R}[UG]\) and \(\overline{\mathscr {R}[UG]}\)) Assuming the notation described above, the condition \(ker[G^*] \cap \mathscr {R}[UG]=\{0\}\) is satisfied if

$$\begin{aligned} U+U^* > 0 ~~on \quad \mathscr {R}[G]. \end{aligned}$$
(9.100)

In addition, if, for some scalar \(\varepsilon _0^2 > 0\),

$$\begin{aligned} U+U^* \ge \varepsilon _0^2I ~~on \quad \mathscr {Y}, \end{aligned}$$
(9.101)

then U has a bounded inverse \(\hat{U}\) on \(\mathscr {Y}\) and the following relationships are satisfied

$$\begin{aligned} ker[G^*] \cap \overline{\mathscr {R}[UG]}=\{0\} \quad and \quad \mathscr {Y}= ker[G^*] \oplus \overline{\mathscr {R}[UG]}. \end{aligned}$$
(9.102)

Notes:  

  1. 1.

    A continuity argument also indicates that the existence of \(\varepsilon _0>0\) implies that the positivity condition \(U+U^* \ge \varepsilon _0^2 I\) remains valid on \(\overline{\mathscr {R}[G]}\).

  2. 2.

    The simplest situation is when \(ker[G^*]=\{0\}\) when \(\mathscr {R}[G]\) is then dense in \(\mathscr {Y}\).

Proof of Theorem   If \(ker[G^*] \cap \mathscr {R}[UG]\ne \{0\}\), there exists a non-zero \(y \in \mathscr {R}[G]\) and a \(u \in \mathscr {U}\) such that \(y=Gu\) and \(G^*Uy=G^*UGu=0\). It follows that \(\langle Gu,(U+U^*)Gu \rangle _{\mathscr {Y}}=\langle y,(U+U^*)y \rangle _{\mathscr {Y}}=0\) which contradicts the assumed strict positivity of \(U+U^*\) on \(\mathscr {R}[G]\). Next, given the existence of \(\varepsilon _0^2>0\), the invertibility of U follows from Theorem 2.9. If \(\varepsilon _0^2>0\) exists, then suppose that there exists a non-zero vector \(y \in \overline{\mathscr {R}[UG]}\) such that \(G^*y=0\). Let \(\delta >0\) be arbitrary and choose \(w \in \mathscr {R}[UG]\) such that \(\Vert y-w\Vert < \delta \). Note that \(w \ne 0\) if \(\delta < \Vert y\Vert _{\mathscr {Y}}\) is sufficiently small as \(\Vert w\Vert > \Vert y\Vert -\delta \). Write \(w=UGv\) and \(G^*y=G^*\left( UGv + (y-w) \right) =0\) and use the Cauchy Schwarz inequality to deduce that

$$\begin{aligned} \varepsilon _0^2\Vert Gv\Vert _{\mathscr {Y}}^2 \le \langle Gv, (U+U^*)Gv \rangle _{\mathscr {Y}} \le 2|\langle v, G^*(y-w)\rangle _{\mathscr {U}}| < 2 \Vert Gv\Vert _{\mathscr {Y}} \delta . \end{aligned}$$
(9.103)

As \( w \ne 0\), \(Gv \ne 0\) and hence \(\varepsilon _0^2 \Vert Gv\Vert < 2 \delta \). Writing \(\Vert w\Vert \le \Vert U\Vert \Vert Gv\Vert \) then produces

$$\begin{aligned} \Vert y\Vert - \delta < \Vert w\Vert \le \Vert U\Vert \Vert Gv\Vert < 2 \varepsilon _0^{-2}\Vert U\Vert \delta \quad for ~~ all~~ \delta > 0. \end{aligned}$$
(9.104)

That is \(Gv=0\) and \(y=0\), which is a contradiction. This proves that \(ker[G^*] \cap \overline{\mathscr {R}[G]}\).

Next, suppose that there exists a non-zero vector y that is orthogonal to the vector subspace \(\mathscr {S}=ker[G^*] \oplus \overline{\mathscr {R}[UG]} \subset \mathscr {Y}\). It follows that it is also orthogonal to the subset \(\mathscr {S}_0 = ker[G^*] \oplus U\overline{\mathscr {R}[G]} \subset \mathscr {S}\). Writing an arbitrarily chosen \(x \in \mathscr {Y}\) in the form \(x = g^* + g\) with \( g^* \in ker[G^*]\) and \(g \in \overline{\mathscr {R}[G]}\), the elements in \(\mathscr {S}_0\) then take the form \(g^*+Ug\). Examination of the orthogonality condition \(\langle y, g^*+Ug \rangle =0\) for all x implies that \(\langle y, g^* \rangle = 0\) and hence that \(y \in ker[G^*]^{\perp } = \overline{\mathscr {R}[G]}\). Choosing \(g=y\) then gives \(0 = \langle y, Uy \rangle \ge \frac{1}{2}\varepsilon _0^2 \Vert y\Vert ^2\) yielding \(y=0\) which is a contradiction. Hence \(\mathscr {S}^{\perp } \subset \mathscr {S}_0^{\perp } =\{0\}\) proving denseness and \(\overline{\mathscr {S}}_0 = \overline{\mathscr {S}}=\mathscr {Y}\).

Finally, note that \(\mathscr {S}_0\) is the range of the bounded linear operator \(\varOmega : \mathscr {Y} \rightarrow \mathscr {Y}\) defined by \(\varOmega = (I-P) + UP\) where P is the self-adjoint, positive, orthogonal projection operator onto \(\overline{\mathscr {R}[G]}\). Note that \(P^2=P\), \(ker[P]=ker[G^*]\), \(\mathscr {R}[P]=\overline{\mathscr {R}[G]}\) and the restriction of P to \(\overline{\mathscr {R}[G]}\) is the identity. To prove that \(\mathscr {S}_0= \mathscr {S}= \mathscr {Y}\) it is sufficient to prove that the range of \(\varOmega \) is closed in \(\mathscr {Y}\). From the Closed Range Theorem 2.11, it is sufficient to prove that the range of \(\varOmega ^*\) is closed. Note that \(\varOmega ^*=(I-P) + PU^*\) and choose \(y \in \overline{\mathscr {R}[\varOmega ^*]}\) and a sequence \(\{x_k\}_{k \ge 1}\) in \(\mathscr {Y}\) with the properties that \(\Vert y-\varOmega ^*x_k\Vert < k^{-1}\). Using orthogonality,

$$\begin{aligned} k^{-2} > \Vert (I-P)(y-x_k)\Vert ^2 + \Vert P(y-U^*x_k)\Vert ^2, \quad for ~~ all ~~ k \ge 1. \end{aligned}$$
(9.105)

In particular, \(\lim _{k \rightarrow \infty }~ (I-P)x_k = (I-P)y\) and \(\lim _{k \rightarrow \infty } ~P(y-U^*x_k)=0\). Writing

$$\begin{aligned} \begin{array}{c} P(y - U^*x_k) = P(y -U^*(I-P)x_k) -PU^*Px_k \quad and \quad Px_k = P(Px_k) \\ then~~gives \quad \lim \nolimits _{k \rightarrow \infty }~PU^*P(Px_k) = Py-PU^*(I-P)y. \end{array} \end{aligned}$$
(9.106)

Writing \(x_k = (I-P)x_k + Px_k\), the convergence of \(\{x_k\}\) to a limit \(x_{\infty } \in \mathscr {Y}\) (which automatically satisfies \(y=\varOmega ^*x_{\infty }\)) then follows by regarding \(\overline{\mathscr {R}[G]}\) as a Hilbert space with the inner product and norm inherited from \(\mathscr {Y}\), regarding \(PU^*P\) as a map from \(\overline{\mathscr {R}[G]}\) into itself and proving that it has a bounded inverse. First note that \(PU^*P\) has an adjoint PUP. Then use the inequalities,

$$\begin{aligned} 0 \le P(I-\lambda U)P(I-\lambda U^*)P \quad and \quad 0 \le P(I-\lambda U^*)P(I-\lambda U)P, \end{aligned}$$
(9.107)

noting that \(P^2=P\), and that \(PUP+PU^*P~ \ge ~\varepsilon _0^2I\) on \(\overline{\mathscr {R}[G]}\), to give (by suitable choice of \(\lambda \)) the existence of a scalar \(\alpha >0\) such that \((PUP)PU^*P \ge \alpha I\) and \((PU^*P)PUP \ge \alpha I\) on \(\overline{\mathscr {R}[G]}\). The existence of a bounded inverse then follows from Theorem 2.9. This completes the proof.   \(\square \)

The result will be used to derive robustness conditions. In particular, the characterization \(\mathscr {Y}=ker[G^*]\oplus \overline{\mathscr {R}[UG]}\) creates a property that is a parallel to the condition \(\mathscr {Y}=ker[G^*]\oplus \overline{\mathscr {R}[G]}\) used in the right multiplicative modelling error case. Before this is done, a suitable norm with which to model robust monotonicity and convergence is needed. First note that any initial error \(e_0\) can be expressed, uniquely, as \(e_0 = e_0^{(1)} + e_0^{(2)}\) with \(e_0^{(1)} \in \overline{\mathscr {R}[UG]}\) and \(e_0^{(2)} \in ker[G^*]\) so that \(L_Ue_0^{(2)}=e_0^{(2)}\) and hence

$$\begin{aligned} e_k = L_U^k e_0^{(1)} + e_0^{(2)} \quad for ~~ all ~~ k \ge 0. \end{aligned}$$
(9.108)

Errors therefore evolve, essentially, in \(\overline{\mathscr {R}[UG]}\). For the purposes of analysis, this closed subspace is regarded as a Hilbert space with inner product \(\langle \cdot , \cdot \rangle _0\) defined by

$$\begin{aligned} \langle y, w \rangle _0 = \langle y, (I-L)w \rangle _{\mathscr {Y}} \quad for ~~ all~~ y,w ~~ in ~~ \overline{\mathscr {R}[UG]}. \end{aligned}$$
(9.109)

The inner product satisfies the linearity requirement with respect to each argument and also satisfies the required positivity properties as, for any non-zero \(y \in \mathscr {Y}\),

$$\begin{aligned} \Vert y\Vert _0^2&= \langle y, (I-L)y \rangle _{\mathscr {Y}} = \varepsilon ^{-2}\langle G^*y, (I+\varepsilon ^{-2}G^*G)^{-1}G^*y \rangle _{\mathscr {U}}\nonumber \\&\ge \frac{\varepsilon ^{-2}}{1 + \varepsilon ^{-2}\Vert G\Vert ^2}\Vert G^*y\Vert ^2_{\mathscr {U}} \end{aligned}$$
(9.110)

and the necessary positivity follows as \(G^*y \ne 0\) if \( y \in \overline{\mathscr {R}[UG]}\).

Note: The norm \(\Vert \cdot \Vert _0\) on \(\overline{\mathscr {R}[UG]}\) is not a norm on \(\mathscr {Y}\) if \(ker[G^*] \ne \{0\}\). More generally, it is not necessarily topologically equivalent to the norm \(\Vert \cdot \Vert _{\mathscr {Y}}\) applied to \(\overline{\mathscr {R}[UG]}\) but it will be

  1. 1.

    if \(ker[G^*] = \{0\}\) and \(\mathscr {R}[UG]\) is finite dimensional, as is the case when G represents a discrete time state space system S(ABCD) on a finite time interval with \(m \le \ell \) and \(rank[D]=m\),

  2. 2.

    or, if \(G^*\) has positivity properties such as \(GG^* \ge \varepsilon _1^2I\) on \(\mathscr {R}[G]\) for some \(\varepsilon _1 > 0\).

In such circumstances, convergence is guaranteed with respect to the norm \(\Vert \cdot \Vert _{\mathscr {Y}}\). More generally, the nature of any convergence could be complex and it will be good practice to consider the nature of convergence with respect to this norm for the model class under consideration. The following result does not resolve the issue but it provides some insight into the issues that may arise. In particular, a proof of boundedness will imply weak convergence.

Theorem 9.14

(Robustness, Boundedness and Weak Convergence) Using the notation defined above, \(~\lim _{k \rightarrow \infty }\Vert e_k\Vert _0=0~\) if, and only if, the condition \(~\lim _{k \rightarrow \infty } \Vert G^*e_k\Vert _{\mathscr {U}}=0\) is satisfied. In addition, if \(~\lim _{k \rightarrow \infty } \Vert e_k\Vert _0=0\), then

$$\begin{aligned} \lim _{k \rightarrow \infty }\langle f, e_k - e_o^{(2)} \rangle _{\mathscr {Y}} = 0, \quad for ~~ all \quad f \in \mathscr {R}[G] \oplus \mathscr {R}[UG]^\perp . \end{aligned}$$
(9.111)

If \(\mathscr {R}[G]\) is closed, then \(\{e_k\}_{k \ge 0}\) converges to zero in the weak topology in \(\mathscr {Y}\) defined by the inner product \(\langle \cdot , \cdot \rangle _{\mathscr {Y}}\). Otherwise weak convergence is assured if the sequence \(\{\Vert e_k\Vert _{\mathscr {Y}}\}_{k \ge 0}\) is bounded.

Proof

The first statement follows as the two norms \(\Vert e \Vert _0\) and \(\Vert G^*e \Vert _{\mathscr {U}}\) in \(\mathscr {Y}\) are topologically equivalent. To prove the second, note that it is necessary only to prove the statement for \(e_k - e_0^{(2)} \in \overline{\mathscr {R}[UG]}\). The first step is to prove that

$$\begin{aligned} \mathscr {Y}= \overline{\mathscr {R}[G]} \oplus ker[(UG)^*] = \overline{\mathscr {R}[G]} \oplus \mathscr {R}[UG]^\perp \end{aligned}$$
(9.112)

by using Theorem 9.13 and noting that the invertibility of U implies that \(\hat{U} + \hat{U}^* \ge \varepsilon _0^2 \hat{U}^* \hat{U} \ge \tilde{\varepsilon }_0^2 I\) for some \(\tilde{\varepsilon }_0^2 > 0\). Again using Theorem 9.13 with the replacements \(G\mapsto U G\), \(U \mapsto \hat{U}\) and \(\varepsilon _0 \mapsto \tilde{\varepsilon }_0\), it is then deduced that \(\mathscr {Y}= \overline{\mathscr {R}[G]} \oplus ker[( U G)^*] = \overline{\mathscr {R}[G]} \oplus \mathscr {R}[UG]^\perp \) as required. Let \(f=f_1 + f_2\) with \(f_1 = G f_3 \in \mathscr {R}[G]\) and \(f_2 \in \mathscr {R}[UG]^\perp \) so that \(\langle f, e_k - e_0^{(2)} \rangle _{\mathscr {Y}} = \langle f_1, e_k - e_0^{(2)} \rangle _{\mathscr {Y}} = \langle f_3, G^*(e_k - e_0^{(2)}) \rangle _{\mathscr {Y}} \rightarrow 0\) as \(k \rightarrow \infty \) if \(\lim _{k \rightarrow \infty } \Vert e_k \Vert _0=0\). Weak convergence then follows if \(\mathscr {R}[G] = \overline{\mathscr {R}[G]}\). Otherwise, boundedness of the sequence \(\{\Vert e_k\Vert _{\mathscr {Y}}\}_{k \ge 0}\) permits the extension of this result to include cases where \(f_1 \in \overline{\mathscr {R}[G]}\).   \(\square \)

Using the new topology, \(\{\Vert e_k\Vert _{\mathscr {Y}}\}_{k \ge 0}\) will typically not be monotonic. However, the monotonicity of \(\{\Vert e_k - e_0^{(2)}\Vert _0\}_{k \ge 0}\) can be proved in the following form,

Theorem 9.15

(Monotonicity and Robustness with Left Multiplicative Errors) Consider NOILC Algorithm 9.1 in its feedforward implementation. Using the notation defined above, a sufficient condition for the norm sequence \(\{\Vert e_k - e_0^{(2)}\Vert _0\}_{k \ge 0}\) to satisfy the monotonicity condition \(\Vert e_{k+1}- e_0^{(2)}\Vert _0 < \Vert e_k - e_0^{(2)}\Vert _0\), for all \(k \ge 0\), in the presence of the left multiplicative modelling error U, is that \(e_0^{(1)} \ne 0\) and that there exists a real number \(\varepsilon _0^2 > 0\) such that

$$\begin{aligned} Condition~~One - U+U^*&\ge U^*(I-L)U + \varepsilon _0^2 I,&~~on ~~ \mathscr {R}[G]. \end{aligned}$$
(9.113)

In these circumstances the subspace decomposition \(\mathscr {Y} = ker[G^*] \oplus \overline{\mathscr {R}[UG]}\) is valid and, in the new topology,

  1. 1.

    the induced norm of the restriction of \(L_U\) to the \(L_U\)-invariant subspace \(\overline{\mathscr {R}[UG]}\) satisfies

    $$\begin{aligned} \Vert L_U\Vert _0 \le 1. \end{aligned}$$
    (9.114)
  2. 2.

    In particular, if \(\mathscr {R}[G]\) is finite dimensional, then this norm is strictly less than unity and, in the original norm topology of \(~\mathscr {Y}\),

    $$\begin{aligned} for ~~ all~~starting ~~ conditions \quad e_0 \in \mathscr {Y},\quad \quad \lim _{k \rightarrow \infty } e_k = e_0^{(2)} \in ker[G^*]. \end{aligned}$$
    (9.115)

Proof

The subspace representation follows from Theorem 9.13 and it is clear that \(e_{k+1}- e_0^{(2)}=L_U(e_k- e_0^{(2)})\) for \(k \ge 0\). Next, note that, if \(e_0^{(1)}=0\), then \(e_k =e_0\) for all \(k \ge 0\). Suppose therefore that \(e_0^{(1)}\ne 0\) and note that \(ker[I-L] = ker[G^*]\) so that \(\overline{\mathscr {R}[I-L]}=ker[G^*]^{\perp }=\overline{\mathscr {R}[G]}\). Examine the operator \(\varLambda : \mathscr {Y} \rightarrow \mathscr {Y}\) defined by

$$\begin{aligned} \varLambda&= (I-L) - L_U^*(I-L)L_U = (I-L)\left( U+U^* - U^*(I-L)U \right) (I-L)\nonumber \\&\ge \varepsilon _0^2 (I-L)^2 \end{aligned}$$
(9.116)

and note that, as a consequence, \(\langle e, \varLambda e \rangle _{\mathscr {Y}} \ge \varepsilon _0^2 \Vert (I-L)e\Vert ^2_{\mathscr {Y}}\). In particular, \(\langle e, \varLambda e \rangle _{\mathscr {Y}} > 0\) on any subset that intersects with \(ker[G^*]\) at \(e=0\) only. The relevant example of such a set is \(\overline{\mathscr {R}[UG]}\) from which, for all \(e \in \overline{\mathscr {R}[UG]}\),

$$\begin{aligned} \langle e, \varLambda e\rangle _{\mathscr {Y}} > 0 \quad which~~ is ~~just \quad \Vert e\Vert ^2_0 > \Vert L_Ue\Vert _0^2 \quad on~~\overline{\mathscr {R}[UG]} \end{aligned}$$
(9.117)

which proves monotonicity. The observation that \(\Vert L_U\Vert _0 \le 1\) on \(\overline{\mathscr {R}[UG]}\) follows from the definition of operator norms whilst strict inequality \(\Vert L_U\Vert _0 < 1\), and the convergence property, when \(\overline{\mathscr {R}[UG]}\) is finite dimensional, follows from the compactness of the unit sphere in any finite dimensional space Hilbert space.   \(\square \)

The following result follows using similar techniques to Theorem 9.8,

Theorem 9.16

(Alternative, Simplified Robustness Conditions) Two sufficient conditions for Condition One of Theorem 9.15 to be valid for some choice of \(\varepsilon _0 > 0\), are as follows

$$\begin{aligned} \begin{array}{rlll} Condition&{} Two ~: &{} ~ U+U^* \ge \frac{\varepsilon ^{-2}\Vert G^*\Vert ^2}{1 + \varepsilon ^{-2}\Vert G^*\Vert ^2}U^*U + \varepsilon _0^2 I \quad &{} on \quad \mathscr {R}[G] \\ Condition &{} Three ~: &{} ~ U+U^* \ge \varepsilon ^{-2}U^*GG^*U + \varepsilon _0^2 I \quad &{} on \quad \mathscr {R}[G] \end{array} \end{aligned}$$
(9.118)

Finally, another sufficient condition is that U has a bounded inverse \(\hat{U}\) on \(\mathscr {Y}\), and, with a suitable choice of \(\varepsilon _0>0\) and \(\theta \in [0,1]\),

$$\begin{aligned} \begin{array}{rll} Condition ~~ Four ~: &{} \hat{U} + \hat{U}^* \ge \theta \beta _{I}I + (1-\theta )\beta _{G}GG^* + \varepsilon _0^2 I~ \quad &{}on ~~ \mathscr {Y} \\ where \quad &{}\quad \beta _{I}=\frac{\varepsilon ^{-2}\Vert G^*\Vert ^2}{1 + \varepsilon ^{-2}\Vert G^*\Vert ^2} \quad and \quad \beta _{G}=\varepsilon ^{-2}.&{} \end{array} \end{aligned}$$
(9.119)

Note that Condition Four is related to conditions Two and Three by choice of \(\theta =0\) or \(\theta =1\). A simple illustration of this fact uses algebraic manipulation to give \(U+U^* \ge \theta \beta _{I}U^*U + (1-\theta )\beta _{G}U^*GG^*U + \varepsilon _0^2 U^*U\) on \(\mathscr {Y}\) and, noting that \(U^*U \ge \alpha I\) for some \(\alpha > 0\), then replacing \(\varepsilon _0^2\) by \(\alpha \varepsilon _0^2\).

9.2.5 Discrete Systems with Left Multiplicative Modelling Errors

The results derived for the case of right multiplicative perturbations carry over to discrete state space systems with a few minor changes. More precisely, consider the case when G can be represented by a discrete time, linear, time invariant, state space model S(ABCD) (or its equivalent supervector description) on the interval \(0 \le t \le N\) and that it has the transfer function matrix G(z). Using the notation of Sect. 9.2.3, the actual plant model is now assumed to take the form UG with left multiplicative modelling error \(U: \mathscr {Y} \rightarrow \mathscr {Y}\) with state space model \(S(A_U,B_U,C_U,D_U)\) and \(m \times m\) transfer function matrix U(z). Again, \(\mathscr {Y}\) and \(\mathscr {U}\) are finite dimensional and hence every vector subspace is closed.

The following results can be proved by requiring the operator inequalities to be valid in the whole space \(\mathscr {Y}\) or \(\mathscr {U}\) as appropriate. Also note that the finite dimensional nature of the spaces and the compactness of the unit circle in the complex plane make it possible to replace \(\varepsilon _0>0\) by \(\varepsilon _0=0\) provided that the symbol \(\ge \) is replaced by the strict inequality \(>\).

Theorem 9.17

(Condition Two for Discrete, State Space System Applications) Let the model G, the actual plant UG and the NOILC objective function be as described above. Then a sufficient condition for Condition Two of Theorem 9.16 to hold is that U(z) is asymptotically stable and also that, in the complex Euclidean topology in \(\mathscr {C}^m\), the matrix inequality

$$\begin{aligned} QU(z) + U^T(z^{-1})Q > \beta _{I} U^T(z^{-1})QU(z),~ for ~~ all ~~ |z|=1, \end{aligned}$$
(9.120)

is satisfied where the “gain” parameter

$$\begin{aligned} \beta _{I} = \frac{\varepsilon ^{-2}\Vert G^*\Vert ^2}{ 1+ \varepsilon ^{-2}\Vert G^*\Vert ^2}. \end{aligned}$$
(9.121)

In these circumstances, the Feedforward NOILC algorithm converges to the component \(e_0^{(2)}\) of \(e_0\) in \(ker[G^*]\). The convergence of \(e_k - e_0^{(2)}\) to zero is monotonic in the norm \(\Vert e \Vert _0 = \sqrt{\langle e,(I-L)e \rangle }\) on \(\mathscr {R}[UG]\).

Finally, for computational purposes, \(\Vert G^*\Vert \) can be replaced in the formula for \(\beta _{I}\) by its upper bound

$$\begin{aligned} \Vert G^*\Vert =\Vert G\Vert \le \sup _{|z|=1}~r\left( R^{-1}G^T(z^{-1})QG(z) \right) , \end{aligned}$$
(9.122)

a relationship that links the frequency domain condition to the weighting matrices Q and R.

The proof is left as an exercise for for the reader to fill in. The following result is also proved in a similar matter to Theorem 9.12,

Theorem 9.18

(Condition Four in Discrete State Space System Applications) A sufficient condition for Condition Four to hold is that G(z) is asymptotically stable, U(z) is invertible and minimum-phase and also that there exists a \(\theta \in [0,1]\) such that, on \(\mathscr {C}^m\), for all \(|z|=1\),

$$\begin{aligned} Q\hat{U}(z) + \hat{U}^T(z^{-1})Q > \theta \beta _{I}Q + (1 - \theta ) \beta _{G}QG(z)R^{-1}G^T(z^{-1})Q. \end{aligned}$$
(9.123)

In these circumstances, the Feedforward NOILC algorithm converges to the component \(e_0^{(2)}\) of \(e_0\) in \(ker[G^*]\). The convergence of \(e_k - e_0^{(2)}\) is monotonic in the norm \(\Vert e \Vert _0 = \sqrt{\langle e,(I-L)e \rangle }\) on \(\mathscr {R}[UG]\).

9.2.6 Monotonicity in \(\mathscr {Y}\) with Respect to the Norm \(\Vert \cdot \Vert _{\mathscr {Y}}\)

This short section is relevant to the situation where the model G is related to the plant by two possible uncertainty descriptions \(U_LG\) and \(GU_R\) where both the left (respectively, right) multiplicative perturbation \(U_L\) (respectively, \(U_R\)) satisfy the conditions of the previous sections for monotonicity in the relevant norm topology. It is assumed here that \(\overline{\mathscr {R}[G]}= \mathscr {Y}\) and \(\overline{\mathscr {R}[G^*]}= \mathscr {U}\) so that \(ker[G^*] = \{0\}\) and hence \(e_0^{(2)}=0\). The monotonicity conditions become, for all \(e_0 \in \mathscr {Y}\),

$$\begin{aligned} \langle e_{k+1}, L e_{k+1} \rangle _{\mathscr {Y}} < \langle e_k, L e_k \rangle _{\mathscr {Y}} \quad and \quad \langle e_{k+1}, (I-L) e_{k+1} \rangle _{\mathscr {Y}} < \langle e_k, (I-L) e_k \rangle _{\mathscr {Y}}, \end{aligned}$$
(9.124)

so that monotonicity with respect to the norm \(\Vert \cdot \Vert _{\mathscr {Y}}\) follows by adding to give

$$\begin{aligned} \Vert e_{k+1}\Vert ^2_{\mathscr {Y}} < \Vert e_k\Vert ^2_{\mathscr {Y}} \quad for~~all ~~ k \ge 0. \end{aligned}$$
(9.125)

The reader will also note that this ensures boundedness but, if \(\mathscr {R}[G]\) is finite dimensional, it also ensures that \(\lim _{k \rightarrow \infty } e_k = 0\). Finally,

Theorem 9.19

(Monotonicity of \(\Vert e_k\Vert _{\mathscr {Y}}\) with Commuting Error Models) If \(U_L\) commutes with G (as is the case with linear, single-input, single-output, time-invariant, state space systems), \(U_R=U_L\) and satisfaction of any of the robustness conditions guarantees monotonicity of the norm sequence \(\{\Vert e_k\Vert _{\mathscr {Y}}\}_{k \ge 0}\).

9.3 Non-minimum-phase Properties and Flat-Lining

Previous chapters (Sect. 8.2.4) have noted that, for single-input, single-output, discrete time state space systems, the presence of non-minimum-phase zeros has a very specific effect on Iterative Control convergence if the number of samples \(N+1\) is large. It takes the form of initially good norm reductions followed by a “plateauing” (or “flat-lining”) phenomenon where the norm \(\Vert e_k\Vert _{\mathscr {Y}}\) is non-zero but begins to reduce infinitesimally slowly. Convergence to zero is still guaranteed theoretically but this slow convergence, in practical terms, means that no further realistic improvement in tracking error is possible using the chosen algorithm. Indeed, tens of thousands of iterations may be needed to achieve even small further improvements. This can be understood, conceptually, by taking an error norm sequence of the form

$$\begin{aligned} \Vert e_k\Vert _{\mathscr {Y}} = \frac{1}{2}\Vert e_0\Vert _{\mathscr {Y}}\left( \lambda _1^k + \lambda _2^k \right) \end{aligned}$$
(9.126)

where the two positive scalars \(\lambda _1\) and \(\lambda _2\) are strictly less than one but \(\lambda _1 < \lambda _2\) and \(\lambda _2\) is very close to unity. If, for example, \(\lambda _1=0.1\) and \(\lambda _2 = 1- 10^{-8} < 1\), then, after only a few iterations, the first term becomes very small leaving a residual convergence represented by \(\Vert e_k\Vert _{\mathscr {Y}} = \frac{1}{2}\Vert e_0\Vert _{\mathscr {Y}}\lambda _2^k\) which remains very close to \(\frac{1}{2}\Vert e_0\Vert _{\mathscr {Y}}\) for many thousands of iterations.

In what follows, a brief discussion of the same phenomenon is given with the conclusion that NOILC and gradient algorithms have identical properties in this case. The approach taken is that of constructing a model of the convergence that reveals the plateauing phenomenon by, firstly, identifying subspaces associated with normal convergence rates and slow convergence and then modelling the plateauing effect by setting the slow convergence rate to precisely zero.

The analysis is restricted to the case of single-input, single-output, asymptotically stable, discrete time state space systems S(ABCD) described by the transfer function \(G(z) = G_m(z)G_{ap}(z)\) where \(G_{ap}(z)\) is all pass and \(G_m(z)\) is asymptotically stable and minimum phase. It is assumed that \(D \ne 0\) by using the shift techniques of Chap. 4 and hence that the supervector model matrices G and \(G_m\) are invertible. The notation of Sect. 8.2.4 is used and the simplest case of \(n_+=1\) and a single real zero with modulus \(|z_1| > 1\) is assumed to simplify the discussion. Theorem 8.3 in Sect. 8.2.4 is particularly relevant to the development.

Using supervector descriptions, \(GG^*\) can be identified with \(\varepsilon ^{-2}R^{-1}QGG^T\) which takes the form of the symmetric matrix \(\varepsilon ^{-2}R^{-1}QG_mG_{ap}G^T_{ap}G_m^T\). The factor \(G_{ap}G^T_{ap}\) is the key to understanding the phenomenon. More precisely, its eigenvalues are all equal to unity apart from a single eigenvalue \(\sigma _1^2\) which takes the value \(z_1^{-2(N+1)} \rightarrow 0\) as \(N \rightarrow \infty \). This eigenvalue has eigenvector \(~\alpha _1= [1, z_1^{-1}, \ldots , z_1^{-N}]^T\) spanning the subspace \(\mathscr {E}_{a+}\). The orthogonal complement of this subspace is denoted by \(\mathscr {E}_1\) and, if \(V: \mathscr {R}^N \rightarrow \mathscr {R}^{N+1}\) is an \((N+1) \times N\) matrix whose columns span \(\mathscr {E}_1\), then \(V^T\alpha _1=0\). Without loss of generality, V can be constructed to satisfy \(V^TV=I_N\).

The key to understanding the behaviour when \(N \gg 0\) is obtained by using the approximation \(\sigma _1 =0\) and using the orthogonal subspace decomposition

$$\begin{aligned} \mathscr {Y} = G_m\mathscr {E}_1 \oplus (G^T_m)^{-1}\mathscr {E}_{a+}. \end{aligned}$$
(9.127)

Note that, with these assumptions, \(GG^T \left( (G^T_m)^{-1}\alpha _1\right) = \sigma _1^2 G_m \alpha _1=0\). Also, note that the matrix \((G_mV)^TG_mV \ge \varepsilon _0^2I\) for all \(N \ge 1\) and some \(\varepsilon _0^2 > 0\). This suggests the approximation \(G_{ap}G^T_{ap}= VV^T\) and hence

$$\begin{aligned} L = (I+GG^*)^{-1} \approx L_A=(I_{N+1} + \varepsilon ^{-2}R^{-1}QG_mVV^TG_m^T)^{-1} \end{aligned}$$
(9.128)

will be accurate for a large number of iterations when N is very large. In particular, \(L_A(G^T_m)^{-1}\alpha _1=(G^T_m)^{-1}\alpha _1\) and hence, if \(e_0\) is written in the form \(e_0 = e_0^{(1)} + e_0^{(2)}\) with \(e_0^{(1)} = G_mV \gamma _1\) and \( e_0^{(2)}=(G^T_m)^{-1}\alpha _1 \gamma _2\) for some scalar \(\gamma _2\) and vector \(\gamma _1 \in \mathscr {R}^N\), a simple computation gives

$$\begin{aligned} e_k = L^ke_0&\approx L_A^k G_mV \gamma _1 + e_0^{(2)} \nonumber \\&= G_mV(I_N+\varepsilon ^{-2}R^{-1}QV^TG^T_mG_mV)^{-k}\gamma _1 + e_0^{(2)}, \end{aligned}$$
(9.129)

and the approximation predicts the convergence to the apparent limit \(e_{\infty }^{pseudo} = e_0^{(2)}\) as \(k \rightarrow \infty \) as \(\Vert (I_N+\varepsilon ^{-2}R^{-1}QV^TG^T_mG_mV)^{-1}\Vert \le (1+ \varepsilon ^{-2}R^{-1}Q \varepsilon _0^2)^{-1} < 1\). This apparent limit vector is a close approximation to the errors that will be observed as the flat-lining phenomenon begins. That is, it is closely associated with the plateau which consists of slow convergence from the value \(\Vert e_0^{(2)}\Vert _{\mathscr {Y}}\) to zero. A simple calculation using orthogonality of \(\mathscr {E}_{1}\) and \(\mathscr {E}_{a+}\) gives

$$\begin{aligned} e_{\infty }^{pseudo} = \frac{\alpha _1^T G_m^{-1}e_0}{\alpha _1^T G_m^{-1}(G^T_m)^{-1}\alpha _1}(G^T_m)^{-1}\alpha _1 = \frac{\alpha _1^T G_m^{-1}e_0}{\alpha _1^T (G^T_mG_m)^{-1}\alpha _1}(G^T_m)^{-1}\alpha _1. \end{aligned}$$
(9.130)

Its value is linked to the vector \(\alpha _1\) but is also influenced by the nature of the minimum-phase component \(G_m\). In this simple case, the nature of the signal can be computed by using the properties of time reversal operators \(\mathscr {T}\) (defined in Sect. 4.3) to write \(G_m^T = \mathscr {T} G_m \mathscr {T}\) and hence \((G_m^T)^{-1} = \mathscr {T} G_m^{-1} \mathscr {T}\). That is, the elements of \((G^T_m)^{-1}\alpha _1\) can be computed from the time reversal of the signal obtained from the first \(N+1\) components of the signal \(G_m^{-1}(z) \hat{\alpha }_1(z)\) where \(\hat{\alpha }_1(z)\) is the \(\mathscr {Z}\)-transform of the signal obtained from the time reversal of \(\alpha _1\), on \(0 \le t \le N\), which is then extended to the infinite interval. As the time reversal of \(\alpha _1\) is just \(z_1^{-N}\left[ 1, z_1, \ldots , z_1^N \right] ^T\), the natural extension is \(\hat{\alpha }_1(t) = z_1^{-N}z_1^t\) for \(t \ge N+1\) which gives \(\hat{\alpha }_1(z) = z_1^{-N}\left( z/(z-z_1)\right) \). Partial fraction methods indicate that

$$\begin{aligned} G_m^{-1}(z) \hat{\alpha }_1(z) = G_m^{-1}(z_1) \hat{\alpha }_1(z) + z_1^{-N}\psi (z,z_1) \end{aligned}$$
(9.131)

where \(\psi (z,z_1)\) is uniformly bounded on the unit circle \(|z|=1\) and for all N. If N is large, \(z_1^{-N}\) is very small and it follows that the time reversal of \(G_m^{-1} \hat{\alpha }_1\) on \(0 \le t \le N\) is very close to the time series \(G_m(z_1)^{-1}\alpha _1\) which is just

$$\begin{aligned} (G_m^T)^{-1}\alpha _1 \approx G_m(z_1)^{-1}\left[ 1, z_1^{-1}, \ldots , z_1^{-N} \right] ^T \quad when \quad N \gg 0. \end{aligned}$$
(9.132)

This time series is defined by the inverse of the zero. Finally, using this approximation, the plateau is characterized by the limit

$$\begin{aligned} e_{\infty }^{pseudo} = \frac{\alpha _1^T G_m^{-1}e_0}{\alpha _1^T G_m^{-1}(G^T_m)^{-1}\alpha _1}(G^T_m)^{-1}\alpha _1 \approx \frac{\alpha _1^T e_0}{\alpha _1^T \alpha _1}\alpha _1. \end{aligned}$$
(9.133)

which is precisely the limit computed for the stable inverse algorithm in Sect. 8.2.4. This signal is proportional to the stable time series \(\alpha _1 = \left[ 1, z_1^{-1}, \ldots , z_1^{-N} \right] ^T\) with the constant of proportionality

$$\begin{aligned} \frac{\alpha _1^T e_0}{\alpha _1^T \alpha _1} = \left( \frac{1 - z_1^{-2(N+1)}}{1 - z_1^{-2}}\right) \sum _{t=0}^N z_1^{-j}e_0(t) \end{aligned}$$
(9.134)

which is small only if \(z_1\) is large and/or \(e_0(t)\) is small on the initial time interval and gaining significant values only when \(|z_1^{-t}| \ll 1\).

9.4 Discussion and Further Reading

9.4.1 Background Comments

The introduction of the use of optimization-based Iterative Learning Control was originally introduced in the general form described here in [6] by the author and co-workers and has led to a number of approaches and proposed generalizations (for example, [17, 44, 64]). Its aim was to create algorithms capable of achieving monotonic convergence for a wide class of linear systems. The use of operator notation in a Hilbert space setting makes this possible with the key assumptions of linearity, boundedness of the operator G and the use of objective functions that are quadratic in the error norm and input change. This structure requires the use of the adjoint operator to compute optimal minimizing solutions described by equations linear in the signals of interest. Crucially, it also permits significant generality whilst providing a natural link with linear, quadratic, optimal control theory for continuous systems [6] and discrete systems [5]. For notational convenience, only examples taken from linear time-invariant, state space systems have been considered but the reader will note that this assumption is easily removed to create convergent NOILC algorithms for “time varying” state space models illustrated by

$$\begin{aligned} \dot{x}(t) = A(t)x(t) + B(t)u(t), \quad x(0)=x_0, \quad y(t) = C(t)x(t) \end{aligned}$$
(9.135)

using the minimization \(u_{k+1}= \arg \min J(u,u_k)\) with weights Q(t) and R(t) as defined in Sect. 9.1.6.

The potentially wide application to different model types is a good feature of NOILC but it also acts as a source of new ideas. The next chapter considers examples of new algorithms where NOILC theory provides iterative control solutions of problems that introduce addition features to this basic idea and/or relax some of the assumptions. These include multi-rate sampling, variations of initial conditions, the notions of intermediate point and multi-task problems and the use of predictive control methodologies to improve convergence rates using multi-models. Other model types can be considered in these approaches including differential delay systems exemplified by the state equations

$$\begin{aligned} \frac{dx(t)}{dt}&= A(t)x(t) + A_0(t)x(t-\tau ) + B(t)u(t), \nonumber \\ \quad x(t)&= x_0(t),~~ t \in [-\tau , 0]~\quad and \nonumber \\ y(t)&= C(t)x(t) + D(t)u(t), \end{aligned}$$
(9.136)

where \(\tau > 0\) is a state time delay. Further information on time delay systems can be found in, for example, [50]. Note the need for a function \(x_0(t)\) to define the initial condition. This model can be identified with a bounded mapping G with the same input and output spaces as those used in the delay free case. The optimal solution has the same characterization in terms of \(G^*\) but the conversion of the update law into a causal form is more complex.

9.4.2 Practical Observations

Before concluding this chapter, it is worthwhile noting the following points:

  1. 1.

    Applications Potential: The practical viability of NOILC has been verified in applications to physical plant as in [13, 29, 97, 100] and Freeman et al. [42].

  2. 2.

    Nonlinearity: The algorithm \(u_{k+1} = \arg \min J(u,u_k)\) provides monotonic error reductions even if the plant model is nonlinear. The optimization problem is then more computationally demanding, its solution may be non-unique, convergence to the reference signal cannot be guaranteed in general and the analytical tools that permitted a detailed analysis of algorithm properties in the linear case are no longer available.

  3. 3.

    Effect of Parameters: The theory shows the place that Q , R and \(\varepsilon ^2\) play in NOILC algorithms and, in the SISO case, the choice of suitable values can be guided by concepts such as that of spectral bandwidth. In MIMO systems, the choice of the “internal structure” of the matrices Q and R is less clear at a theoretical level. Simple guidelines suggest that Q influences the relative convergence rates in the control loops whilst R provides some control over the changes in control signal at each iteration. Where solutions are non-unique, R can be used to reflect the preferred inputs to be used to solve the tracking problem. For example, if R is diagonal, increasing \(R_{jj}\) will penalize changes in a loop input \(u_j(t)\) and hence encourage the algorithm to use inputs in the other loops to achieve the error reduction sought. These connections are, however, imprecise and detailed choices will inevitably often involve a degree of trial and error or experience with the domain of application.

  4. 4.

    Exponentially Weighted Signals: As in the case of inverse and gradient algorithms , the use of exponentially weighted signals can be included in the algorithm in both the continuous time and discrete time cases. For continuous time systems, the objective function, with \(\lambda >0\), is

    $$\begin{aligned}&\int _0^T~e^{-2\lambda t}\nonumber \\&\quad \times \,\left( \!(r(t) \,{-}\, y(t))^TQ(t)(r(t) \,{-}\, y(t)) \,{+}\, (u(t) \,{-}\, u_k(t))^TR(t)(u(t)\,{-}\,u_k(t))\! \right) dt \end{aligned}$$
    (9.137)

    and provides (additional) time dependent weighting \(e^{-2\lambda t}\) for S(ABC). Equivalently, the weighting can be removed and the system replaced by \(S(A-\lambda I,B,C)\) with signals uxyre replaced by weighted signals defined by the map \(f(t) \mapsto e^{-\lambda t}f(t)\). This simple modification could, in principle, be used to reduce the plateauing effect caused by non-minimum-phase properties but the nature of the convergence is changed as greater emphasis will be placed on error reductions early in the interval [0, T] before later errors can take any priority.

  5. 5.

    Iteration Variation of Parameters: The parameters Q, R and \(\varepsilon \) could, in practice be varied from iteration to iteration to speed up or slow down convergence speeds . The theory still guarantees norm reductions at each iteration but, as that norm may be changing as Q changes, little can be said about the nature of the convergence without additional assumptions. Note that the Riccati solution will need to be recomputed whenever any one or more of Q, R and \(\varepsilon \) is changed. This idea will be revisited in Chap. 13 in the “Notch Algorithm” and in the “Parameter Optimal” methods in Chap. 14.

9.4.3 Performance

It can be argued that NOILC is a benchmark ILC algorithm in the sense that it has defined properties and guaranteed convergence for a wide class of linear applications. Its application to non-minimum-phase (NMP) systems is influenced by the emergence of two stage convergence. This consists of an initially good convergence rate followed by a rate that is so slow that, in practice, no further benefits can be achieved. For minimum-phase systems, the algorithm is therefore capable of excellent convergence properties but acceptable behaviour in the NMP case will depend on the zero positions and the reference signal in order to ensure that the “plateau” value for \(\Vert e_k\Vert _{\mathscr {Y}}\) represents acceptable tracking accuracy. A more detailed analysis of these phenomena is given in the reference [84] for discrete state space systems and in [86] for the continuous case. The two analyses suggest that large zeros and/or reference signals that are small over a suitably large initial sub-interval can lead to outcomes where tracking accuracy is very good. The validity of the predictions of the analysis has been tested, successfully, in a laboratory environment [86] and also in unreported industrial tests, the outcomes of which are subject to commercial confidentiality agreements.

9.4.4 Robustness and the Inverse Algorithm

Next, as in the case of inverse model and gradient algorithms, modelling errors also influence the nature of the convergence. The interesting conclusion arising in this chapter is that, assuming that robustness will be assessed in terms of monotonicity of signal norms in the presence of multiplicative modelling errors U, the error norm \(\Vert e\Vert _{\mathscr {Y}}\) is not necessarily the most appropriate norm for analysis. For the situations considered, the choice of norm seems to depend on the nature of the modelling error as exemplified by the choice of error norm \(\Vert e\Vert _0 = \sqrt{\langle e,L e \rangle _{\mathscr {Y}}}\) with \(L=(I+\varepsilon ^{-2}GG^*)^{-1}\) for asymptotically stable right multiplicative perturbations. The nature of the new form of monotonicity will therefore, in general, need to be considered to make sure that it is acceptable for the application domain. A unifying theme is that,

the robustness of NOILC can be guaranteed by positivity tests that are algebraically identical to those that guarantee robust convergence of inverse model and/or gradient algorithms,

(Chapters 6 and 7). There seems to be no escaping the need for positivity of the multiplicative modelling error U although, being expressed as a positivity condition on \(U+U^*\), the use of the topology in the relevant Hilbert space (in the form of Q and R for example) may be helpful. The analysis of robustness is technically complex and there may be room for improvements in and extensions to the results presented. The chosen material was aimed at three objectives, namely,

  1. 1.

    that of achieving a degree of generality at the operator theoretical level to illustrate and underline the robustness implicit in the NOILC paradigm and

  2. 2.

    providing guidance on the technical issues that separate the cases where the output space \(\mathscr {Y}\) is either finite or infinite dimensional.

  3. 3.

    Crucially, the approach permits the construction of more familiar frequency domain tests for discrete time, state space systems and, in particular, linking these tests to structural properties of the plant model such as pole and zero positions.

9.4.5 Alternatives?

The reader may have a view on the choices made by the author. Alternatives could modify the form of the objective function or the characterization of the modelling errors, for example, using errors in the parameters in the model [3]. The choice of objective function has received attention [17, 64] with the change

$$\begin{aligned} J(u,u_k)~ \mapsto ~\Vert e\Vert _{\mathscr {Y}}^2 + \varepsilon ^2 \Vert u - u_k\Vert ^2_{\mathscr {U}}~+~ \varepsilon _1^2\Vert u|^2_{\mathscr {U}},\quad \varepsilon _1^2 > 0, \end{aligned}$$
(9.138)

where the last term is included to constrain the magnitude of the inputs that are used. In this case \(\Vert e\Vert _{\mathscr {Y}}^2 + \varepsilon _1^2\Vert u|^2_{\mathscr {U}}\) reduces monotonically each iteration and the limit error cannot be zero. The optimal solution on iteration \(k+1\) is

$$\begin{aligned} u_{k+1} = \frac{\varepsilon ^2}{\varepsilon ^2+\varepsilon _1^2}u_k + \frac{1}{\varepsilon ^2+\varepsilon _1^2} G^*e_{k+1} \end{aligned}$$
(9.139)

which is a version of the feedback relaxed NOILC Algorithm 9.2 where \(\varepsilon ^2\) is replaced by \(\varepsilon ^2+\varepsilon _1^2\) and \(\alpha = \frac{\varepsilon ^2}{\varepsilon ^2+\varepsilon _1^2}\).

Another example, [102] added data from a number of iterations and an optimization over a number of parameters in the objective function

$$\begin{aligned} \Vert e\Vert _{\mathscr {Y}}^2 ~+~ \Vert u- \sum _{j=1}^{n_p}\alpha _{k+1,j}u_{k+1-j}\Vert ^2_{\mathscr {U}}, \quad with ~~ \sum _{j=1}^{n_p}\alpha _{k+1,j}=1 \end{aligned}$$
(9.140)

representing linear combinations of \(n_p\) previous input signals. The options are many and include a multi-objective approach proposed by Tousain et al. [108].

9.4.6 Q, R and Dyadic Expansions

Finally, the evolution of individual frequency components in the error provides an interesting approximation, in the frequency domain, in terms of the eigenvalues \(\sigma ^2_j(z)\) of \(G(z)R^{-1}G^T(z^{-1})Q\). It opens up the possibility of using it as a basis for the choice of Q and R, particularly if one particular frequency \(z_s\) is very important to design. For example, suppose that \(m=\ell \). Using the concept of Dyadic Expansions [80, 81], consider the case where \(G(z_s)G^{-1}(\overline{z}_s)\) only has eigenvalues \(\{\zeta _j \}_{1 \le j \le m}\) of unit modulus, then there exists real, nonsingular, \(m \times m\) matrices \(P_1\) and \(P_2\) such that

$$\begin{aligned} G(z_s) = P_1\varLambda P_2 \quad with \quad \varLambda = diag[\zeta _1, \zeta _2, \ldots , \zeta _m]. \end{aligned}$$
(9.141)

Noting that \(G(z)R^{-1}G^T(z^{-1})Q = P_1 \varLambda P_2 R^{-1} P_2^T \overline{\varLambda }P_1^TQ\) and \(\varLambda \overline{\varLambda }=I_m\) then gives ,

Theorem 9.20

(NOILC, Q, R and Dyadic Expansions) Suppose that \(m = \ell \), \(|z_s|=1\) and let \(P_0\) be any choice of real, positive definite, diagonal \(m \times m\) matrix. Using the notation and assumptions described above, then \(\sigma _j^2(z_s) =1, ~ 1 \le j \le m\), and hence \(G(z_s)R^{-1}G^T(z_s^{-1})Q=I_m\) if

$$\begin{aligned} \quad P_2R^{-1}P_2^T = P_0 \quad and \quad P_1P_0P_1^TQ = I_m. \end{aligned}$$
(9.142)

The result suggests that the frequency component of the error signal at \(z=z_s\) will be attenuated at a rate \((1+\varepsilon ^{-2})^{-k}\). The degrees of freedom implicit in \(P_0\) leave a number of open questions that would need resolution for wide application. The approach places constraints on the choice of Q and/or R as, for example, it is unlikely that the relations above will allow diagonal choices. Also, [80], it is known that, although Dyadic Expansions apply more generally, the assumptions of unit modulus eigenvalues for \(G(z_s)G^{-1}(\overline{z}_s)\) does not describe every situation. Further research is needed to cover such cases.