Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

5.1 Introduction

Least-squares algorithms aim at the minimization of the sum of the squares of the difference between the desired signal and the model filter output [1, 2]. When new samples of the incoming signals are received at every iteration, the solution for the least-squares problem can be computed in recursive form resulting in the recursive least-squares (RLS) algorithms. The conventional version of these algorithms will be the topic of this chapter.

The RLS algorithms are known to pursue fast convergence even when the eigenvalue spread of the input signal correlation matrix is large. These algorithms have excellent performance when working in time-varying environments. All these advantages come with the cost of an increased computational complexity and some stability problems, which are not as critical in LMS-based algorithms [3, 4].

Several properties related to the RLS algorithms are discussed including misadjustment, tracking behavior, which are verified through a number of simulation results.

Chapter 16 deals with the quantization effects in the conventional RLS algorithm. Chapter 17 provides an introduction to Kalman filters whose special case can be related to the RLS algorithms.

5.2 The Recursive Least-Squares Algorithm

The objective here is to choose the coefficients of the adaptive filter such that the output signal y(k), during the period of observation, will match the desired signal as closely as possible in the least-squares sense. The minimization process requires the information of the input signal available so far. Also, the objective function we seek to minimize is deterministic.

The generic FIR adaptive filter realized in the direct form is shown in Fig. 5.1. The input signal information vector at a given instant k is given by

$$\mathbf{x}(k) ={ \left [x(k)\:x(k - 1)\ldots x(k - N)\right ]}^{T}$$
(5.1)

where N is the order of the filter. The coefficients w j (k), for j = 0, 1, , N, are adapted aiming at the minimization of a given objective function. In the case of least-squares algorithms, the objective function is deterministic and is given by

$$\begin{array}{rcl}{ \xi }^{d}(k)& =& \sum \limits_{i=0}^{k}{\lambda }^{k-i}{\epsilon }^{2}(i) \\ & =& \sum \limits_{i=0}^{k}{\lambda }^{k-i}{\left [d(i) -{\mathbf{x}}^{T}(i)\mathbf{w}(k)\right ]}^{2}\end{array}$$
(5.2)

where w(k) = [w o (k) w 1(k)…w N (k)]T is the adaptive-filter coefficient vector and ε(i) is the a posteriori output errorFootnote 1 at instant i. The parameter λ is an exponential weighting factor that should be chosen in the range 0 ≪ λ ≤ 1. This parameter is also called forgetting factor since the information of the distant past has an increasingly negligible effect on the coefficient updating.

Fig. 5.1
figure 1

Adaptive FIR filter

It should be noticed that in the development of the LMS and LMS-based algorithms we utilized the a priori error. In the RLS algorithms ε(k) is used to denote the a posteriori error whereas e(k) denotes the a priori error. The a posteriori error will be our first choice in the development of the RLS-based algorithms.

As can be noted, each error consists of the difference between the desired signal and the filter output, using the most recent coefficients w(k). By differentiating ξd(k) with respect to w(k), it follows that

$$\frac{\partial {\xi }^{d}(k)} {\partial \mathbf{w}(k)} = -2\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i)\left [d(i) -{\mathbf{x}}^{T}(i)\mathbf{w}(k)\right ]$$
(5.3)

By equating the result to zero, it is possible to find the optimal vector w(k) that minimizes the least-squares error, through the following relation:

$$\begin{array}{rcl} -\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i){\mathbf{x}}^{T}(i)\mathbf{w}(k) +\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i)d(i) = \left [\begin{array}{c} 0\\ 0\\ \vdots \\ 0 \end{array} \right ]& & \\ \end{array}$$

The resulting expression for the optimal coefficient vector w(k) is given by

$$\begin{array}{rcl} \mathbf{w}(k)& =&{ \left [\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i){\mathbf{x}}^{T}(i)\right ]}^{-1}\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i)d(i) \\ & =&{ \mathbf{R}}_{D}^{-1}(k){\mathbf{p}}_{ D}(k) \end{array}$$
(5.4)

where R D (k) and p D (k) are called the deterministic correlation matrix of the input signal and the deterministic cross-correlation vector between the input and desired signals, respectively.

In (5.4) it was assumed that R D (k) is nonsingular. However, if R D (k) is singular a generalized inverse [1] should be used instead in order to obtain a solution for w(k) that minimizes ξd(k). Since we are assuming that in most practical applications the input signal has persistence of excitation, the cases requiring generalized inverse are not discussed here. It should be mentioned that if the input signal is considered to be zero for k < 0 then R D (k) will always be singular for k < N, i.e., during the initialization period. During this period, the optimal value of the coefficients can be calculated, for example, by the backsubstitution algorithm to be presented in Sect. 9.2.1.

The straightforward computation of the inverse of R D (k) results in an algorithm with computational complexity of O[N 3]. In the conventional RLS algorithm the computation of the inverse matrix is avoided through the use of the matrix inversion lemma [1], first presented in the previous chapter for the LMS-Newton algorithm. Using the matrix inversion lemma, see (4.51), the inverse of the deterministic correlation matrix can then be calculated in the following form

$${ \mathbf{S}}_{D}(k) ={ \mathbf{R}}_{D}^{-1}(k) = \frac{1} {\lambda }\left [{\mathbf{S}}_{D}(k - 1) -\frac{{\mathbf{S}}_{D}(k - 1)\mathbf{x}(k){\mathbf{x}}^{T}(k){\mathbf{S}}_{D}(k - 1)} {\lambda +{ \mathbf{x}}^{T}(k){\mathbf{S}}_{D}(k - 1)\mathbf{x}(k)} \right ]$$
(5.5)

The complete conventional RLS algorithm is described in Algorithm 5.1.

An alternative way to describe the conventional RLS algorithm can be obtained if (5.4) is rewritten in the following form

$$\left [\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i){\mathbf{x}}^{T}(i)\right ]\mathbf{w}(k) = \lambda \left [\sum \limits_{i=0}^{k-1}{\lambda }^{k-i-1}\mathbf{x}(i)d(i)\right ] + \mathbf{x}(k)d(k)$$
(5.6)

By considering that \({\mathbf{R}}_{D}(k - 1)\mathbf{w}(k - 1) ={ \mathbf{p}}_{D}(k - 1)\), it follows that

$$\begin{array}{rlrlrl} \left [\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i){\mathbf{x}}^{T}(i)\right ]\mathbf{w}(k) & = \lambda {\mathbf{p}}_{ D}(k - 1) + \mathbf{x}(k)d(k) & & \\ & = \lambda {\mathbf{R}}_{D}(k - 1)\mathbf{w}(k - 1) + \mathbf{x}(k)d(k) & & \\ & = \left [\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i){\mathbf{x}}^{T}(i) -\mathbf{x}(k){\mathbf{x}}^{T}(k)\right ] & & \\ &\quad \times \mathbf{w}(k - 1) + \mathbf{x}(k)d(k) &\end{array}$$
(5.7)

where in the last equality the matrix x(k)x T(k) was added and subtracted inside square bracket on the right side of (5.7). Now, define the a priori error as

$$e(k) = d(k) -{\mathbf{x}}^{T}(k)\mathbf{w}(k - 1)$$
(5.8)

By expressing d(k) as a function of the a priori error and replacing the result in (5.7), after few manipulations, it can be shown that

$$\mathbf{w}(k) = \mathbf{w}(k - 1) + e(k){\mathbf{S}}_{D}(k)\mathbf{x}(k)$$
(5.9)

With (5.9), it is straightforward to generate an alternative conventional RLS algorithm as shown in Algorithm 5.2.

In Algorithm 5.2, ψ(k) is an auxiliary vector required to reduce the computational burden defined by

$$\begin{array}{rcl} \mathbf{\psi }(k) ={ \mathbf{S}}_{D}(k - 1)\mathbf{x}(k)& &\end{array}$$
(5.10)

Further reduction in the number of divisions is possible if an additional auxiliary vector is used, defined as

$$\begin{array}{rcl} \mathbf{\phi }(k) = \frac{\mathbf{\psi }(k)} {\lambda +{ \mathbf{\psi }}^{T}(k)\mathbf{x}(k)}& &\end{array}$$
(5.11)

This vector can be used to update S D (k) as follows:

$${ \mathbf{S}}_{D}(k) = \frac{1} {\lambda }\left [{\mathbf{S}}_{D}(k - 1) -\mathbf{\psi }(k){\mathbf{\phi }}^{T}(k)\right ]$$
(5.12)

As will be discussed, the above relation can lead to stability problems in the RLS algorithm.

5.3 Properties of the Least-Squares Solution

In this section, some properties related to the least-squares solution are discussed in order to give some insight to the algorithm behavior in several situations to be discussed later on.

5.3.1 Orthogonality Principle

Define the matrices X(k) and d(k) that contain all the information about the input signal vector x(k) and the desired signal vector d(k) as follows:

$$\begin{array}{rcl} \mathbf{X}(k)& =& \left [\begin{array}{ccccc} x(k) & {\lambda }^{1/2}x(k - 1) &\cdots &{\lambda }^{(k-1)/2}x(1)&{\lambda }^{k/2}x(0) \\ x(k - 1) & {\lambda }^{1/2}x(k - 2) &\cdots &{\lambda }^{(k-1)/2}x(0)& 0\\ \vdots & \vdots & & \vdots & \vdots \\ x(k - N)&{\lambda }^{1/2}x(k - N - 1)&\cdots & 0 & 0 \end{array} \right ] \\ & =& \left [\mathbf{x}(k)\:{\lambda }^{1/2}\mathbf{x}(k - 1)\ldots {\lambda }^{k/2}\mathbf{x}(0)\right ] \end{array}$$
(5.13)
$$\begin{array}{rcl} \mathbf{d}(k)& =&{ \left [d(k)\:{\lambda }^{1/2}d(k - 1)\ldots {\lambda }^{k/2}d(0)\right ]}^{T}\end{array}$$
(5.14)

where X(k) is \((N + 1) \times (k + 1)\) and d(k) is (k + 1) ×1.

By using the matrices above defined it is possible to replace the least-squares solution of (5.4) by the following relation

$$\mathbf{X}(k){\mathbf{X}}^{T}(k)\mathbf{w}(k) = \mathbf{X}(k)\mathbf{d}(k)$$
(5.15)

The product X T(k)w(k) forms a vector including all the adaptive-filter outputs when the coefficients are given by w(k). This vector corresponds to an estimate of d(k). Hence, defining

$$\mathbf{y}(k) ={ \mathbf{X}}^{T}(k)\mathbf{w}(k) ={ \left [y(k)\:{\lambda }^{1/2}y(k - 1)\ldots {\lambda }^{k/2}y(0)\right ]}^{T}$$
(5.16)

it follows from (5.15) that

$$\begin{array}{rcl} \mathbf{X}(k){\mathbf{X}}^{T}(k)\mathbf{w}(k) -\mathbf{X}(k)\mathbf{d}(k) = \mathbf{X}(k)[\mathbf{y}(k) -\mathbf{d}(k)] = \mathbf{0}& &\end{array}$$
(5.17)

This relation means that the weighted-error vector given by

$$\begin{array}{rcl} \mathbf{\epsilon }(k) = \left [\begin{array}{c} \epsilon (k) \\ {\lambda }^{1/2}\epsilon (k - 1)\\ \vdots \\ {\lambda }^{k/2}\epsilon (0) \end{array} \right ] = \mathbf{d}(k) -\mathbf{y}(k)& &\end{array}$$
(5.18)

is in the null space of X(k), i.e., the weighted-error vector is orthogonal to all row vectors of X(k). This justifies the fact that (5.15) is often called normal equation. A geometrical interpretation can easily be given for a least-squares problem solution with a single coefficient filter.

Example 5.1.

Suppose that λ = 1 and that the following signals are involved in the least-squares problem

$$\begin{array}{rcl} \mathbf{d}(1) = \left [\begin{array}{c} 0.5\\ 1.5 \end{array} \right ]\ \ \ \mathbf{X}(1) = [1\ \ - 2]& & \\ \end{array}$$

The optimal coefficient is given by

$$\begin{array}{rcl} \mathbf{X}(1){\mathbf{X}}^{T}(1)\mathbf{w}(1)& =& [1\ \ - 2]\left [\begin{array}{c} 1 \\ - 2 \end{array} \right ]\ \ \mathbf{w}(1) \\ & =& \mathbf{X}(1)\mathbf{d}(1) \\ & =& [1\ \ - 2]\left [\begin{array}{c} 0.5\\ 1.5 \end{array} \right ] \\ \end{array}$$

After performing the calculations the result is

$$\begin{array}{rcl} \mathbf{w}(1) = -\frac{1} {2}& & \\ \end{array}$$

The output of the adaptive filter with coefficient given by w(1) is

$$\begin{array}{rcl} \mathbf{y}(1) = \left [\begin{array}{c} -\frac{1} {2} \\ 1 \end{array} \right ]& & \\ \end{array}$$

Note that

$$\begin{array}{rcl} \mathbf{X}(1)[\mathbf{y}(1) -\mathbf{d}(1)]& =& [1\ \ - 2]\left [\begin{array}{c} - 1\\ - 0.5 \end{array} \right ] \\ & =& 0\end{array}$$

Figure 5.2 illustrates the fact that y(1) is the projection of d(1) in the X(1) direction. In the general case we can say that the vector y(k) is the projection of d(k) onto the subspace spanned by the rows of X(k).

5.3.2 Relation Between Least-Squares and Wiener Solutions

When λ = 1 the matrix \(\frac{1} {k+1}{\mathbf{R}}_{D}(k)\) for large k is a consistent estimate of the input signal autocorrelation matrix R, if the process from which the input signal was taken is ergodic. The same observation is valid for the vector \(\frac{1} {k+1}{\mathbf{p}}_{D}(k)\) related to p if the desired signal is also ergodic. In this case,

$$\begin{array}{rcl} \mathbf{R}& =& {\lim }_{k\rightarrow \infty } \frac{1} {k + 1}\sum \limits_{i=0}^{k}\mathbf{x}(i){\mathbf{x}}^{T}(i) {=\lim }_{ k\rightarrow \infty } \frac{1} {k + 1}{\mathbf{R}}_{D}(k)\end{array}$$
(5.19)

and

$$\begin{array}{rcl} \mathbf{p}& =& {\lim }_{k\rightarrow \infty } \frac{1} {k + 1}\sum \limits_{i=0}^{k}\mathbf{x}(i)d(i) {=\lim }_{ k\rightarrow \infty } \frac{1} {k + 1}{\mathbf{p}}_{D}(k)\end{array}$$
(5.20)

It can then be shown that

$$\mathbf{w}(k) ={ \mathbf{R}}_{D}^{-1}(k){\mathbf{p}}_{ D}(k) ={ \mathbf{R}}^{-1}\mathbf{p} ={ \mathbf{w}}_{ o}$$
(5.21)

when k tends to infinity. This result indicates that the least-squares solution tends to the Wiener solution if the signals involved are ergodic and stationary. The stationarity requirement is due to the fact that the estimate of R given by (5.19) is not sensitive to any changes in R for large values of k. If the input signal is nonstationary R D (k) is a biased estimate for R. Note that in this case R is time varying.

Fig. 5.2
figure 2

Geometric interpretation of least-squares solution

5.3.3 Influence of the Deterministic Autocorrelation Initialization

The initialization of \({\mathbf{S}}_{D}(-1) = \delta \mathbf{I}\) causes a bias in the coefficients estimated by the adaptive filter. Suppose that the initial value given to R D (k) is taken into account in the actual RLS solution as follows:

$$\begin{array}{rcl} \sum \limits_{i=-1}^{k}{\lambda }^{k-i}\mathbf{x}(i){\mathbf{x}}^{T}(i)\mathbf{w}(k)& =& \left [\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i){\mathbf{x}}^{T}(i) + \frac{{\lambda }^{k+1}} {\delta } \mathbf{I}\right ]\mathbf{w}(k) \\ & =&{ \mathbf{p}}_{D}(k) \end{array}$$
(5.22)

By recognizing that the deterministic autocorrelation matrix leading to an unbiased solution does not include the initialization matrix, we now examine the influence of this matrix. By multiplying \({\mathbf{S}}_{D}(k) ={ \mathbf{R}}_{D}^{-1}(k)\) on both sides of (5.22), and by considering k, it can be concluded that

$$\mathbf{w}(k) + \frac{{\lambda }^{k+1}} {\delta }{ \mathbf{S}}_{D}(k)\mathbf{w}(k) ={ \mathbf{w}}_{o}$$
(5.23)

where w o is the optimal solution for the RLS algorithm.

The bias caused by the initialization of S D (k) is approximately

$$\mathbf{w}(k) -{\mathbf{w}}_{o} \approx -\frac{{\lambda }^{k+1}} {\delta }{ \mathbf{S}}_{D}(k){\mathbf{w}}_{o}$$
(5.24)

For λ < 1, it is straightforward to conclude that the bias tends to zero as k tends to infinity. On the other hand, when λ = 1 the elements of S D (k) get smaller when the number of iterations increase, as a consequence this matrix approaches a null matrix for large k.

The RLS algorithm would reach the optimum solution for the coefficients after N + 1 iterations if no measurement noise is present, and the influence of the initialization matrix S D ( − 1) is negligible at this point. This result follows from the fact that after N + 1 iterations, the input signal vector has enough information to allow the adaptive algorithm to identify the coefficients of the unknown system. In other words, enough information means the tap delay line is filled with information of the input signal.

5.3.4 Steady-State Behavior of the Coefficient Vector

In order to understand better the steady-state behavior of the adaptive-filter coefficients, suppose that an FIR filter with coefficients given by w o is being identified by an adaptive FIR filter of the same order employing an LS algorithm. Also assume that a measurement noise signal n(k) is added to the desired signal before the error signal is calculated as follows:

$$\begin{array}{rcl} d(k) ={ \mathbf{w}}_{o}^{T}\mathbf{x}(k) + n(k)& &\end{array}$$
(5.25)

where the additional noise is considered to be a white noise with zero mean and variance given by σ n 2.

Given the adaptive-filter input vectors x(k), for k = 0, 1, , we are interested in calculating the average values of the adaptive-filter coefficients w i (k), for i = 0, 1, , N. The desired result is the following equality valid for kN.

$$\begin{array}{rcl} E[\mathbf{w}(k)]& =& E\left \{{\left [\mathbf{X}(k){\mathbf{X}}^{T}(k)\right ]}^{-1}\mathbf{X}(k)\mathbf{d}(k)\right \} \\ & =& E\left \{{\left [\mathbf{X}(k){\mathbf{X}}^{T}(k)\right ]}^{-1}\mathbf{X}(k)[{\mathbf{X}}^{T}(k){\mathbf{w}}_{ o} + \mathbf{n}(k)]\right \} \\ & =& E\left \{{\left [\mathbf{X}(k){\mathbf{X}}^{T}(k)\right ]}^{-1}\mathbf{X}(k){\mathbf{X}}^{T}(k){\mathbf{w}}_{ o}\right \} ={ \mathbf{w}}_{o}\end{array}$$
(5.26)

where \(\mathbf{n}(k) = {[n(k)\ {\lambda }^{1/2}n(k - 1)\ \lambda n(k - 2)\ \ldots \ {\lambda }^{k/2}n(0)]}^{T}\) is the noise vector, whose elements were considered orthogonal to the input signal. The above equation shows that the estimate given by the LS algorithm is an unbiased estimate when λ ≤ 1.

A more accurate analysis reveals the behavior of the adaptive-filter coefficients during the transient period. The error in the filter coefficients can be described by the following (N + 1) ×1 vector

$$\Delta \mathbf{w}(k) = \mathbf{w}(k) -{\mathbf{w}}_{o}$$
(5.27)

It follows from (5.7) that

$${ \mathbf{R}}_{D}(k)\mathbf{w}(k) = \lambda {\mathbf{R}}_{D}(k - 1)\mathbf{w}(k - 1) + \mathbf{x}(k)d(k)$$
(5.28)

Defining the minimum output error as

$${e}_{o}(k) = d(k) -{\mathbf{x}}^{T}(k){\mathbf{w}}_{ o}$$
(5.29)

and replacing d(k) in (5.28), it can be deduced that

$${ \mathbf{R}}_{D}(k)\Delta \mathbf{w}(k) = \lambda {\mathbf{R}}_{D}(k - 1)\Delta \mathbf{w}(k - 1) + \mathbf{x}(k){e}_{o}(k)$$
(5.30)

where the following relation was used

$${ \mathbf{R}}_{D}(k) = \lambda {\mathbf{R}}_{D}(k - 1) + \mathbf{x}(k){\mathbf{x}}^{T}(k)$$
(5.31)

The solution of (5.30) is given by

$$\begin{array}{rcl} \Delta \mathbf{w}(k)& =& {\lambda }^{k+1}{\mathbf{S}}_{ D}(k){\mathbf{R}}_{D}(-1)\Delta \mathbf{w}(-1) +{ \mathbf{S}}_{D}(k)\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i){e}_{ o}(i)\end{array}$$
(5.32)

By replacing R D ( − 1) by \(\frac{1} {\delta }\) I and taking the expected value of the resulting equation, it follows that

$$\begin{array}{rcl} E[\Delta \mathbf{w}(k)]=\frac{{\lambda }^{k+1}} {\delta } E[{\mathbf{S}}_{D}(k)]\Delta \mathbf{w}(-1)+E\left [{\mathbf{S}}_{D}(k)\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i){e}_{ o}(i)\right ]& &\end{array}$$
(5.33)

Since S D (k) is dependent on all past input signal vectors, becoming relatively invariant when the number of iterations increase, the contribution of any individual x(i) can be considered negligible. Also, due to the orthogonality principle, e o (i) can also be considered uncorrelated to all elements of x(i). This means that the last vector in (5.33) cannot have large element values. On the other hand, the first vector in (5.33) can have large element values only during the initial convergence, since as k, λk + 1 → 0 and S D (k) is expected to have a nonincreasing behavior, i.e., R D (k) is assumed to remain positive definite as k and the input signal power does not become too small. The above discussion leads to the conclusion that the adaptive-filter coefficients tend to be the optimal values in w o almost independently from the eigenvalue spread of the input signal correlation matrix.

If we consider the spectral decomposition of the matrix E[S D (k)] (see (2.65)), the dependency on the eigenvalues of R can be easily accounted for in the simple case of λ = 1. Applying the expected value operator to the relation of (5.19), we can infer that

$$E[{\mathbf{S}}_{D}(k)] \approx \frac{{\mathbf{R}}^{-1}} {(k + 1)}$$
(5.34)

for large k. Now consider the slowest decaying mode of the spectral decomposition of E[S D (k)] given by

$${ \mathbf{S}}_{{D}_{\mathrm{max}}} = \frac{{\mathbf{q}}_{\mathrm{min}}{\mathbf{q}}_{\mathrm{min}}^{T}} {(k + 1){\lambda }_{\mathrm{min}}}$$
(5.35)

where λmin is the smallest eigenvalue of R and q min is the corresponding eigenvector. Applying this result to (5.33), with λ = 1, we can conclude that the value of the minimum eigenvalue affects the convergence of the filter coefficients only in the first few iterations, because the term k + 1 in the denominator reduces the values of the elements of \({\mathbf{S}}_{{D}_{\mathrm{max}}}\).

Further interesting properties of the coefficients generated by the LS algorithm are:

  • The estimated coefficients are the best linear unbiased solution to the identification problem [1], in the sense that no other unbiased solution generated by alternative approaches has lower variance.

  • If the additive noise is normally distributed the LS solution reaches the Cramer-Rao lower bound, resulting in a minimum-variance unbiased solution [1]. The Cramer-Rao lower bound establishes a lower bound to the coefficient-error-vector covariance matrix for any unbiased estimator of the optimal parameter vector w o .

5.3.5 Coefficient-Error-Vector Covariance Matrix

So far, we have shown that the estimation parameters in the vector w(k) converge on average to their optimal value of the vector w o . However, it is essential to analyze the coefficient-error-vector covariance matrix in order to determine how good is the obtained solution, in the sense that we are measuring how far the parameters wander around the optimal solution.

Using the same convergence assumption of the last section, it will be shown here that for λ = 1 the coefficient-error-vector covariance matrix is given by

$$\mathrm{cov}\,\left [\Delta \mathbf{w}(k)\right ] = E\left [\left (\mathbf{w}(k) -{\mathbf{w}}_{o}\right ){(\mathbf{w}(k) -{\mathbf{w}}_{o})}^{T}\right ] = {\sigma }_{ n}^{2}E[{\mathbf{S}}_{ D}(k)]$$
(5.36)

Proof.

First note that by using (5.4) and (5.15), the following relations are verified

$$\begin{array}{rcl} \mathbf{w}(k) -{\mathbf{w}}_{o}& =&{ \mathbf{S}}_{D}(k){\mathbf{p}}_{D}(k) -{\mathbf{S}}_{D}(k){\mathbf{S}}_{D}^{-1}(k){\mathbf{w}}_{ o}\end{array}$$
(5.37)
$$\begin{array}{rcl} & =&{ \left [\mathbf{X}(k){\mathbf{X}}^{T}(k)\right ]}^{-1}\mathbf{X}(k)\left [\mathbf{d}(k) -{\mathbf{X}}^{T}(k){\mathbf{w}}_{ o}\right ]\end{array}$$
(5.38)
$$\begin{array}{rcl} & =&{ \left [\mathbf{X}(k){\mathbf{X}}^{T}(k)\right ]}^{-1}\mathbf{X}(k)\mathbf{n}(k)\end{array}$$
(5.39)

where \(\mathbf{n}(k) = {[n(k)\ {\lambda }^{1/2}n(k - 1)\ \lambda n(k - 2)\ \ldots \ {\lambda }^{k/2}n(0)]}^{T}\).

Applying the last equation to the covariance of the coefficient-error-vector it follows that

$$\begin{array}{rcl} \mathrm{cov}\,[\Delta \mathbf{w}(k)]& =& E\left \{{\left [\mathbf{X}(k){\mathbf{X}}^{T}(k)\right ]}^{-1}\mathbf{X}(k)E[\mathbf{n}(k){\mathbf{n}}^{T}(k)]{\mathbf{X}}^{T}(k){\left [\mathbf{X}(k){\mathbf{X}}^{T}(k)\right ]}^{-1}\right \} \\ & =& E\left \{{\sigma }_{n}^{2}{\mathbf{S}}_{ D}(k)\mathbf{X}(k)\Lambda {\mathbf{X}}^{T}(k){\mathbf{S}}_{ D}(k)\right \} \\ \end{array}$$

where

$$\begin{array}{rcl} \Lambda = \left [\begin{array}{ccccc} 1\\ &\lambda & &\mathbf{0} \\ & &{\lambda }^{2} \\ & \mathbf{0}& \ddots \\ & & & &{\lambda }^{k} \end{array} \right ]& & \\ \end{array}$$

For λ = 1, Λ = I, it follows that

$$\begin{array}{rcl} \mathrm{cov}\,[\Delta \mathbf{w}(k)]& =& E\left [{\sigma }_{n}^{2}{\mathbf{S}}_{ D}(k)\mathbf{X}(k){\mathbf{X}}^{T}(k){\mathbf{S}}_{ D}(k)\right ] \\ & =& E\left [{\sigma }_{n}^{2}{\mathbf{S}}_{ D}(k){\mathbf{R}}_{D}(k){\mathbf{S}}_{D}(k)\right ] \\ & =& {\sigma }_{n}^{2}E\left [{\mathbf{S}}_{ D}(k)\right ] \\ \end{array}$$

Therefore, when λ = 1, the coefficient-error-vector covariance matrix tends to decrease its norm as time progresses since S D (k) is also norm decreasing. The variance of the additional noise n(k) influences directly the norm of the covariance matrix.

5.3.6 Behavior of the Error Signal

It is important to understand how the error signal behaves in the RLS algorithm. When a measurement noise is present in the adaptive-filtering process, the a priori error signal is given by

$$e(k) = d^{\prime}(k) -{\mathbf{w}}^{T}(k - 1)\mathbf{x}(k) + n(k)$$
(5.40)

where d′(k) = w o T x(k) is the desired signal without measurement noise.

Again if the input signal is considered known (conditional expectation), then

$$\begin{array}{rcl} E[e(k)]& =& E[d^{\prime}(k)] - E\left [{\mathbf{w}}^{T}(k - 1)\right ]\mathbf{x}(k) + E[n(k)] \\ & =& E\left [{\mathbf{w}}_{o}^{T} -{\mathbf{w}}_{ o}^{T}\right ]\mathbf{x}(k) + E[n(k)] \\ & =& E[n(k)] \end{array}$$
(5.41)

assuming that the adaptive-filter order is sufficient to model perfectly the desired signal.

From (5.41), it can be concluded that if the noise signal has zero mean, then

$$E[e(k)] = 0$$

It is also important to assess the minimum mean value of the squared error that is reachable using an RLS algorithm. The minimum mean-square error (MSE) in the presence of external uncorrelated noise is given by

$${\xi }_{\mathrm{min}} = E[{e}^{2}(k)] = E[{e}_{ o}^{2}(k)] = E[{n}^{2}(k)] = {\sigma }_{ n}^{2}$$
(5.42)

where it is assumed that the adaptive-filter multiplier coefficients were frozen at their optimum values and that the number of coefficients of the adaptive filter is sufficient to model the desired signal. In the conditions described the a priori error corresponds to the minimum output error as defined in (5.29). It should be noted, however, that if the additive noise is correlated with the input and the desired signals, a more complicated expression for the MSE results, accounting for the referred correlation.

When employing the a posteriori error the value of minimum MSE, denoted by ξmin,p, differs from the corresponding value related to the a priori error. First note that by using (5.39), the following relation is verified

$$\begin{array}{rcl} \Delta \mathbf{w}(k) ={ \mathbf{S}}_{D}(k)\mathbf{X}(k)\mathbf{n}(k)& &\end{array}$$
(5.43)

When a measurement noise is present in the adaptive-filtering process, the a posteriori error signal is given by

$$\begin{array}{rcl} \epsilon (k) = d^{\prime}(k) -{\mathbf{w}}^{T}(k)\mathbf{x}(k) + n(k) = -\Delta {\mathbf{w}}^{T}(k)\mathbf{x}(k) + {e}_{ o}(k)& &\end{array}$$
(5.44)

The expression for the MSE related to the a posteriori error is then given by

$$\begin{array}{rcl} \xi (k)& =& E[{\epsilon }^{2}(k)] \\ & =& E[{e}_{o}^{2}(k)] - 2E[{\mathbf{x}}^{T}(k)\Delta \mathbf{w}(k){e}_{ o}(k)] + E[\Delta {\mathbf{w}}^{T}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)\Delta \mathbf{w}(k)]\end{array}$$
(5.45)

By replacing the expression (5.43) in (5.45) above, the following relations follow

$$\begin{array}{rcl} \xi (k)& =& E\left [{e}_{o}^{2}(k)\right ] - 2E\left [{\mathbf{x}}^{T}(k){\mathbf{S}}_{ D}(k)\mathbf{X}(k)\mathbf{n}(k){e}_{o}(k)\right ] \\ & & +E\left [\Delta {\mathbf{w}}^{T}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)\Delta \mathbf{w}(k)\right ] \\ & =& {\sigma }_{n}^{2} - 2E\left [{\mathbf{x}}^{T}(k){\mathbf{S}}_{ D}(k)\mathbf{X}(k)\right ]\left [\begin{array}{c} {\sigma }_{n}^{2} \\ 0\\ \vdots\\ 0 \end{array} \right ] + E\left [\Delta {\mathbf{w}}^{T}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)\Delta \mathbf{w}(k)\right ] \\ & =& {\sigma }_{n}^{2} - 2E\left [{\mathbf{x}}^{T}(k){\mathbf{S}}_{ D}(k)\mathbf{x}(k)\right ]{\sigma }_{n}^{2} + E\left [\Delta {\mathbf{w}}^{T}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)\Delta \mathbf{w}(k)\right ] \\ & =& {\xi }_{\mathrm{min,p}} + E\left [\Delta {\mathbf{w}}^{T}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)\Delta \mathbf{w}(k)\right ] \end{array}$$
(5.46)

where in the second equality it was considered that the additional noise is uncorrelated with the input signal and that e o (k) = n(k). This equality occurs when the adaptive filter has sufficient order to identify the unknown system.

Note that ξmin,p related to the a posteriori error in (5.46) is not the same as minimum MSE of the a priori error, denoted in this book by ξmin. The last term, that is Ew T(k)x(k)x T(kw(k)], in (5.46) determines the excess MSE of the RLS algorithm.

It is possible to verify that the following expressions for ξmin,p are accurate approximations

$$\begin{array}{rcl}{ \xi }_{\mathrm{min,p}}& =& \left \{1 - 2E\left [{\mathbf{x}}^{T}(k){\mathbf{S}}_{ D}(k)\mathbf{x}(k)\right ]\right \}{\sigma }_{n}^{2} \\ & =& \left \{1 - 2\mathrm{tr}\left [E\left ({\mathbf{S}}_{D}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)\right )\right ]\right \}{\sigma }_{ n}^{2} \\ & =& \left \{1 - 2\mathrm{tr}\left [ \frac{1 - \lambda } {1 - {\lambda }^{k+1}}\mathbf{I}\right ]\right \}{\sigma }_{n}^{2} \\ & =& \left \{1 - 2(N + 1)\left [ \frac{1 - \lambda } {1 - {\lambda }^{k+1}}\right ]\right \}{\sigma }_{n}^{2} \\ & =& \left \{1 - 2(N + 1)\left [ \frac{1} {1 + \lambda + {\lambda }^{2} + \cdots + {\lambda }^{k}}\right ]\right \}{\sigma }_{n}^{2}\end{array}$$
(5.47)

In the above expression, it is considered that S D (k) is slowly varying as compared to x(k) when λ → 1, such that

$$\begin{array}{rcl} E\left [{\mathbf{S}}_{D}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)\right ] \approx E\left [{\mathbf{S}}_{ D}(k)\right ]E\left [\mathbf{x}(k){\mathbf{x}}^{T}(k)\right ]& & \\ \end{array}$$

and that by using (5.55)

$$\begin{array}{rcl} E\left [{\mathbf{S}}_{D}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)\right ] \approx \frac{1 - \lambda } {1 - {\lambda }^{k+1}}\mathbf{I}& & \\ \end{array}$$

Equation (5.47) applies to the case where λ < 1, and as can be observed from the term multiplying N + 1 there is a transient for small k which dies away when the number of iterations increases.Footnote 2 If we fit the decrease in the term multiplying N + 1 at each iteration to an exponential envelop, the time constant will be \(\frac{1} {{\lambda }^{k+1}}\). Unlike the LMS algorithm, this time constant is time varying and is not related to the eigenvalue spread of the input signal correlation matrix.

Example 5.2.

Repeat the equalization problem of Example 3.1 using the RLS algorithm.

  1. (a)

    Using λ = 0. 99, run the algorithm and save matrix S D (k) at iteration 500 and compare with the inverse of the input signal correlation matrix.

  2. (b)

    Plot the convergence path for the RLS algorithm on the MSE surface.

Solution.

  1. (a)

    The inverse of matrix R, as computed in the Example 3.1, is given by

    $$\begin{array}{rcl}{ \mathbf{R}}^{-1}& =& 0.45106\left [\begin{array}{cc} 1.6873&0.7937 \\ 0.7937&1.6873\\ \end{array} \right ] = \left [\begin{array}{cc} 0.7611&0.3580\\ 0.3580 &0.7611\\ \end{array} \right ]\\ \end{array}$$

    The initialization matrix S D ( − 1) is a diagonal matrix with the diagonal elements equal to 0. 1. The matrix S D (k) at the 500th iteration, obtained by averaging the results of 30 experiments, is

    $$\begin{array}{rcl}{ \mathbf{S}}_{D}(500)& =& \left [\begin{array}{cc} 0.0078&0.0037\\ 0.0037 &0.0078\\ \end{array} \right ]\\ \end{array}$$

    Also, the obtained values of the deterministic cross-correlation vector is

    $$\begin{array}{rcl}{ \mathbf{p}}_{D}(500)& =& \left [\begin{array}{c} 95.05\\ 46.21\\ \end{array} \right ]\\ \end{array}$$

    Now, we divide each element of the matrix R − 1 by

    $$\begin{array}{rcl} \frac{1 - {\lambda }^{k+1}} {1 - \lambda } = 99.34& & \\ \end{array}$$

    since in a stationary environment \(E[{\mathbf{S}}_{D}(k)] = \frac{1-\lambda } {1-{\lambda }^{k+1}}{ \mathbf{R}}^{-1}\), see (5.55) for a formal proof.

    The resulting matrix is

    $$\begin{array}{rcl} \frac{1} {99.34}{\mathbf{R}}^{-1}& =& \left [\begin{array}{cc} 0.0077&0.0036 \\ 0.0036&0.0077\\ \end{array} \right ]\\ \end{array}$$

    As can be noted the values of the elements of the above matrix are close to the average values of the corresponding elements of matrix S D (500).

    Similarly, if we multiply the cross-correlation vector p by 99. 34, the result is

    $$\begin{array}{rcl} 99.34\mathbf{p}& =& \left [\begin{array}{c} 94.61\\ 47.31\\ \end{array} \right ]\\ \end{array}$$

    The values of the elements of this vector are also close to the corresponding elements of p D (500).

  2. (b)

    The convergence path of the RLS algorithm on the MSE surface is depicted in Fig. 5.3. The reader should notice that the RLS algorithm approaches the minimum using large steps when the coefficients of the adaptive filter are far away from the optimum solution. □

5.3.7 Excess Mean-Square Error and Misadjustment

In a practical implementation of the recursive least-squares algorithm, the best estimation for the unknown parameter vector is given by w(k), whose expected value is w o . However, there is always an excess MSE at the output caused by the error in the coefficient estimation, namely \(\Delta \mathbf{w}(k) = \mathbf{w}(k) -{\mathbf{w}}_{o}\). The mean-square error is (see (5.46))

$$\begin{array}{rcl} \xi (k)& =& {\xi }_{\mathrm{min,p}} + E\left \{{\left [\mathbf{w}(k) -{\mathbf{w}}_{o}\right ]}^{T}\mathbf{x}(k){\mathbf{x}}^{T}(k)\left [\mathbf{w}(k) -{\mathbf{w}}_{ o}\right ]\right \} \\ & =& {\xi }_{\mathrm{min,p}} + E\left [\Delta {\mathbf{w}}^{T}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)\Delta \mathbf{w}(k)\right ] \end{array}$$
(5.48)

Now considering that Δw j (k), for j = 0, 1, , N, are random variables with zero mean and independent of x(k), the MSE can be calculated as follows

$$\begin{array}{rcl} \xi (k)& =& {\xi }_{\mathrm{min,p}} + E\left [\Delta {\mathbf{w}}^{T}(k)\mathbf{R}\Delta \mathbf{w}(k)\right ] \\ & =& {\xi }_{\mathrm{min,p}} + E\left \{\mathrm{tr}\,\left [\mathbf{R}\Delta \mathbf{w}(k)\Delta {\mathbf{w}}^{T}(k)\right ]\right \} \\ & =& {\xi }_{\mathrm{min,p}} + \mathrm{tr}\,\left \{\mathbf{R}E\left [\Delta \mathbf{w}(k)\Delta {\mathbf{w}}^{T}(k)\right ]\right \} \\ & =& {\xi }_{\mathrm{min,p}} + \mathrm{tr}\,\left \{\mathbf{R}\mathrm{cov}\,\left [\Delta \mathbf{w}(k)\right ]\right \} \end{array}$$
(5.49)
Fig. 5.3
figure 3

Convergence path of the RLS adaptive filter

On a number of occasions it is interesting to consider the analysis for λ = 1 separated from that for λ < 1.

5.3.7.1 Excess MSE for λ = 1

By applying in (5.49) the results of (5.36) and (5.19), and considering that

$$\begin{array}{rcl}{ \xi }_{\mathrm{min,p}} = \left (1 - 2\frac{N + 1} {k + 1} \right ){\xi }_{\mathrm{min}} = \left (1 - 2\frac{N + 1} {k + 1} \right ){\sigma }_{n}^{2}& & \\ \end{array}$$

for λ = 1 (see (5.42) and (5.47)), we can infer that

$$\begin{array}{rcl} \xi (k)& =& \left [1 - 2\frac{N + 1} {k + 1} \right ]{\sigma }_{n}^{2} + \mathrm{tr}\,\left \{\mathbf{R}E[{\mathbf{S}}_{ D}(k)]\right \}{\sigma }_{n}^{2} \\ & =& \left [1 - 2\frac{N + 1} {k + 1} + \mathrm{tr}\,\left (\mathbf{R} \frac{{\mathbf{R}}^{-1}} {k + 1}\right )\right ]{\sigma }_{n}^{2}\:\:\mathrm{for}\:\:k \rightarrow \infty \\ & =& \left (1 - 2\frac{N + 1} {k + 1} + \frac{N + 1} {k + 1} \right ){\sigma }_{n}^{2}\:\:\mathrm{for}\:\:k \rightarrow \infty \\ & =& \left (1 -\frac{N + 1} {k + 1} \right ){\sigma }_{n}^{2}\:\:\mathrm{for}\:\:k \rightarrow \infty \\ \end{array}$$

As can be noted the minimum MSE can be reached only after the algorithm has operated on a number of samples larger than the filter order.

5.3.7.2 Excess MSE for λ < 1

Again assuming that the mean-square error surface is quadratic as considered in (5.48), the expected excess MSE is then defined by

$$\Delta \xi (k) = E\left [\Delta {\mathbf{w}}^{T}(k)\mathbf{R}\Delta \mathbf{w}(k)\right ]$$
(5.50)

The objective now is to calculate and analyze the excess MSE when λ < 1. From (5.30) one can show that

$$\Delta \mathbf{w}(k) = \lambda {\mathbf{S}}_{D}(k){\mathbf{R}}_{D}(k - 1)\Delta \mathbf{w}(k - 1) +{ \mathbf{S}}_{D}(k)\mathbf{x}(k){e}_{o}(k)$$
(5.51)

By applying (5.51) to (5.50), it follows that

$$E\left [\Delta {\mathbf{w}}^{T}(k)\mathbf{R}\Delta \mathbf{w}(k)\right ] = {\rho }_{ 1} + {\rho }_{2} + {\rho }_{3} + {\rho }_{4}$$
(5.52)

where

$$\begin{array}{rcl}{ \rho }_{1}& =& {\lambda }^{2}E\left [\Delta {\mathbf{w}}^{T}(k - 1){\mathbf{R}}_{ D}(k - 1){\mathbf{S}}_{D}(k)\mathbf{R}{\mathbf{S}}_{D}(k){\mathbf{R}}_{D}(k - 1)\Delta \mathbf{w}(k - 1)\right ] \\ {\rho }_{2}& =& \lambda E\left [\Delta {\mathbf{w}}^{T}(k - 1){\mathbf{R}}_{ D}(k - 1){\mathbf{S}}_{D}(k)\mathbf{R}{\mathbf{S}}_{D}(k)\mathbf{x}(k){e}_{o}(k)\right ] \\ {\rho }_{3}& =& \lambda E\left [{\mathbf{x}}^{T}(k){\mathbf{S}}_{ D}(k)\mathbf{R}{\mathbf{S}}_{D}(k){\mathbf{R}}_{D}(k - 1)\Delta \mathbf{w}(k - 1){e}_{o}(k)\right ] \\ {\rho }_{4}& =& E\left [{\mathbf{x}}^{T}(k){\mathbf{S}}_{ D}(k)\mathbf{R}{\mathbf{S}}_{D}(k)\mathbf{x}(k){e}_{o}^{2}(k)\right ] \\ \end{array}$$

Now each term in (5.52) will be evaluated separately.

  1. 1.

    Evaluation of ρ1

    First note that as k, it can be assumed that R D (k) ≈ R D (k − 1), then

    $${\rho }_{1} \approx {\lambda }^{2}E\left [\Delta {\mathbf{w}}^{T}(k - 1)\mathbf{R}\Delta \mathbf{w}(k - 1)\right ]$$
    (5.53)
  2. 2.

    Evaluation of ρ2

    Since each element of R D (k) is given by

    $$\begin{array}{rcl}{ r}_{d,ij}(k)& =& \sum \limits_{l=0}^{k}{\lambda }^{k-l}x(l - i)x(l - j) \end{array}$$
    (5.54)

for 0 ≤ i, jN. Therefore,

$$\begin{array}{rcl} E[{r}_{d,ij}(k)] =\sum \limits_{l=0}^{k}{\lambda }^{k-l}E[x(l - i)x(l - j)]& & \\ \end{array}$$

If x(k) is stationary, \(r(i - j) = E[x(l - i)x(l - j)]\) is independent of the value l, then

$$E[{r}_{d,ij}(k)] = r(i - j)\frac{1 - {\lambda }^{k+1}} {1 - \lambda } \approx \frac{r(i - j)} {1 - \lambda }$$
(5.55)

Equation (5.55) allows the conclusion that

$$E[{\mathbf{R}}_{D}(k)] \approx \frac{1} {1 - \lambda }E\left [\mathbf{x}(k){\mathbf{x}}^{T}(k)\right ] = \frac{1} {1 - \lambda }\mathbf{R}$$
(5.56)

In each step, it can be considered that

$${ \mathbf{R}}_{D}(k) = \frac{1} {1 - \lambda }\mathbf{R} + \Delta \mathbf{R}(k)$$
(5.57)

where Δ R(k) is a symmetric error matrix with zero-mean stochastic entries that are independent of the input signal. From (5.56) and (5.57), it can be concluded that

$${ \mathbf{S}}_{D}(k)\mathbf{R} \approx (1 - \lambda )\left [\mathbf{I} - (1 - \lambda ){\mathbf{R}}^{-1}\Delta \mathbf{R}(k)\right ]$$
(5.58)

where in the last relation S D (k)Δ R(k) was considered approximately equal to

$$(1 - \lambda ){\mathbf{R}}^{-1}\Delta \mathbf{R}(k)$$

by using (5.56) and disregarding second-order errors.

In the long run, it is known that \(E[{\mathbf{S}}_{D}(k)\mathbf{R}] = (1 - \lambda )\mathbf{I}\), that means the second term inside the square bracket in (5.58) is a measure of the perturbation caused by Δ R(k) in the product S D (k)R. Denoting the perturbation by Δ I(k), that is

$$\begin{array}{rcl} \Delta \mathbf{I}(k) = (1 - \lambda ){\mathbf{R}}^{-1}\Delta \mathbf{R}(k)& &\end{array}$$
(5.59)

it can be concluded that

$$\begin{array}{rcl}{ \rho }_{2}& \approx & \lambda (1 - \lambda )E\left \{\Delta {\mathbf{w}}^{T}(k - 1)\left [\mathbf{I} - \Delta {\mathbf{I}}^{T}(k)\right ]\mathbf{x}(k){e}_{ o}(k)\right \} \\ & \approx & \lambda (1 - \lambda )E\left [\Delta {\mathbf{w}}^{T}(k - 1)\right ]E[\mathbf{x}(k){e}_{ o}(k)] = 0\end{array}$$
(5.60)

where it was considered that Δ w T(k − 1) is independent of x(k) and e o (k), Δ I(k) was also considered an independent error matrix with zero mean, and finally we used the fact that x(k) and e o (k) are orthogonal.3. Following a similar approach it can be shown that

$$\begin{array}{rcl}{ \rho }_{3}& \approx & \lambda (1 - \lambda )E\left \{{\mathbf{x}}^{T}(k)\left [\mathbf{I} - \Delta \mathbf{I}(k)\right ]\Delta \mathbf{w}(k - 1){e}_{ o}(k)\right \} \\ & \approx & \lambda (1 - \lambda )E\left [{\mathbf{x}}^{T}(k){e}_{ o}(k)\right ]E\left [\Delta \mathbf{w}(k - 1)\right ] = 0\end{array}$$
(5.61)

4. Evaluation of ρ4

$$\begin{array}{rcl}{ \rho }_{4}& =& E\left [{\mathbf{x}}^{T}(k){\mathbf{S}}_{ D}(k)\mathbf{R}{\mathbf{S}}_{D}(k)\mathbf{R}{\mathbf{R}}^{-1}\mathbf{x}(k){e}_{ o}^{2}(k)\right ] \\ & \approx & {(1 - \lambda )}^{2}E\left \{{\mathbf{x}}^{T}(k){\left [\mathbf{I} - \Delta \mathbf{I}(k)\right ]}^{2}{\mathbf{R}}^{-1}\mathbf{x}(k)\right \}{\xi }_{\mathrm{ min}}\end{array}$$
(5.62)

where (5.58) and (5.29) were used and e o (k) was considered independent of x(k) and Δ I(k). By using the property that

$$\begin{array}{rcl} E\left \{{\mathbf{x}}^{T}(k){[\mathbf{I} - \Delta \mathbf{I}(k)]}^{2}{\mathbf{R}}^{-1}\mathbf{x}(k)\right \} = \mathrm{tr}\,E\left \{{[\mathbf{I} - \Delta \mathbf{I}(k)]}^{2}{\mathbf{R}}^{-1}\mathbf{x}(k){\mathbf{x}}^{T}(k)\right \}& & \\ \end{array}$$

and recalling that Δ I(k) has zero mean and is independent of x(k), then (5.62) is simplified to

$${\rho }_{4} = {(1 - \lambda )}^{2}\mathrm{tr}\,\{\mathbf{I} + E[\Delta {\mathbf{I}}^{2}(k)]\}{\xi }_{\mathrm{ min}}$$
(5.63)

where tr[ ⋅] means trace of [ ⋅], and we utilized the fact that \(E\{{\mathbf{R}}^{-1}\mathbf{x}(k){\mathbf{x}}^{T}(k)\} = \mathbf{I}\).

By using (5.53), (5.60), and (5.63), it follows that

$$\begin{array}{rcl} E[\Delta {\mathbf{w}}^{T}(k)\mathbf{R}\Delta \mathbf{w}(k)]& =& {\lambda }^{2}E[\Delta {\mathbf{w}}^{T}(k - 1)\mathbf{R}\Delta \mathbf{w}(k - 1)] \\ & & +{(1 - \lambda )}^{2}\mathrm{tr}\,\{\mathbf{I} + E[\Delta {\mathbf{I}}^{2}(k)]\}{\xi }_{\mathrm{ min}}\end{array}$$
(5.64)

Asymptotically, the solution of the above equation is

$${\xi }_{\mathrm{exc}} = \frac{1 - \lambda } {1 + \lambda }\mathrm{tr}\,\left \{\mathbf{I} + E\left [\Delta {\mathbf{I}}^{2}(k)\right ]\right \}{\xi }_{\mathrm{ min}}$$
(5.65)

Note that the term given by E[Δ I 2(k)] is not easy to estimate and is dependent on fourth-order statistics of the input signal. However, in specific situations, it is possible to compute an approximate estimate for this matrix. In steady state, it can be considered for white noise input signal that only the diagonal elements of R and Δ R are important to the generation of excess MSE. Even when the input signal is not white, this diagonal dominance can be considered a reasonable approximation in most of the cases. From the definition of Δ I(k) in (5.59), it follows that

$$E[\Delta {\mathbf{I}}_{ii}^{2}(k)] = {(1 - \lambda )}^{2}\frac{E[\Delta {r}_{ii}^{2}(k)]} {{[{\sigma }_{x}^{2}]}^{2}}$$
(5.66)

where σ x 2 is variance of x(k). By calculating \(\Delta \mathbf{R}(k) - \lambda \Delta \mathbf{R}(k - 1)\) using (5.57), we show that

$$\Delta {r}_{ii}(k) = \lambda \Delta {r}_{ii}(k - 1) + x(k - i)x(k - i) - {r}_{ii}$$
(5.67)

Squaring the above equation, applying the expectation operation, and using the independence between Δr ii (k) and x(k), it follows that

$$\begin{array}{rcl} E\left [\Delta {r}_{ii}^{2}(k)\right ]& =& {\lambda }^{2}E\left [\Delta {r}_{ ii}^{2}(k - 1)\right ] + E\left \{{\left [x(k - i)x(k - i) - {r}_{ ii}\right ]}^{2}\right \}\end{array}$$
(5.68)

Therefore, asymptotically

$$E\left [\Delta {r}_{ii}^{2}(k)\right ] = \frac{1} {1 - {\lambda }^{2}}{\sigma }_{{x}^{2}(k-i)}^{2} = \frac{1} {1 - {\lambda }^{2}}{\sigma }_{{x}^{2}}^{2}$$
(5.69)

By substituting (5.69) in (5.66), it becomes

$$E\left [\Delta {\mathbf{I}}_{ii}^{2}(k)\right ] = \frac{1 - \lambda } {1 + \lambda } \frac{{\sigma }_{{x}^{2}}^{2}} {{({\sigma }_{x}^{2})}^{2}} = \frac{1 - \lambda } {1 + \lambda }\mathcal{K}$$
(5.70)

where \(\mathcal{K} = \frac{{\sigma }_{{x}^{2}}^{2}} {{({\sigma }_{x}^{2})}^{2}}\) is dependent on input signal statistics. For Gaussian signals, \(\mathcal{K} = 2\) [5].

Returning to our main objective, the excess MSE can then be described as

$${\xi }_{\mathrm{exc}} = (N + 1)\frac{1 - \lambda } {1 + \lambda }\left (1 + \frac{1 - \lambda } {1 + \lambda }\mathcal{K}\right ){\xi }_{\mathrm{min}}$$
(5.71)

If λ is approximately one and \(\mathcal{K}\) is not very large, then

$${\xi }_{\mathrm{exc}} = (N + 1)\frac{1 - \lambda } {1 + \lambda }{\xi }_{\mathrm{min}}$$
(5.72)

this expression can be reached through a simpler analysis [6]. However, the more complete derivation shown here can give more insight to the interpretation of the results obtained by using the RLS algorithm, mainly when λ is not very close to one.

The misadjustment formula can be deduced from (5.71)

$$M = \frac{{\xi }_{\mathrm{exc}}} {{\xi }_{\mathrm{min}}} = (N + 1)\frac{1 - \lambda } {1 + \lambda }\left (1 + \frac{1 - \lambda } {1 + \lambda }\mathcal{K}\right )$$
(5.73)

As can be noted, the decrease of λ from one brings a fourth-order statistics term into the picture by increasing the misadjustment. Then, the fast adaptation of the RLS algorithm, that corresponds to smaller λ, brings a noisier steady-state response. Therefore, when working in a stationary environment the best choice for λ would be one, if the excess MSE in the steady state is considered high for other values of λ. However, other problems such as instability due to quantization noise are prone to occur when λ = 1.

5.4 Behavior in Nonstationary Environments

In cases where the input signal and/or the desired signal are nonstationary, the optimal values of the coefficients are time variant and described by w o (k). That means the autocorrelation matrix R(k) and/or the cross-correlation vector p(k) are time variant. For example, typically in a system identification application the autocorrelation matrix R(k) is time invariant while the cross-correlation matrix p(k) is time variant, because in this case the designer can choose the input signal. On the other hand, in equalization, prediction, and signal enhancement applications both the input and the desired signal are nonstationary leading to time-varying matrices R(k) and p(k).

The objective in the present section is to analyze how close the RLS algorithm is able to track the time-varying solution w o (k). Also, it is of interest to learn how the tracking error in w(k) affects the output MSE [5]. Here, the effects of the measurement noise are not considered, since only the nonstationary effects are desired. Also, both effects on the MSE can be added since, in general, they are independent.

Recall from (5.8) and (5.9) that

$$\mathbf{w}(k) = \mathbf{w}(k - 1) +{ \mathbf{S}}_{D}(k)\mathbf{x}(k)[d(k) -{\mathbf{x}}^{T}(k)\mathbf{w}(k - 1)]$$
(5.74)

and

$$d(k) ={ \mathbf{x}}^{T}(k){\mathbf{w}}_{ o}(k - 1) + {e^{\prime}}_{o}(k)$$
(5.75)

The error signal e′ o (k) is the minimum error at iteration k being generated by the nonstationarity of the environment. One can replace (5.75) in (5.74) in order to obtain the following relation

$$\begin{array}{rcl} \mathbf{w}(k)& =& \mathbf{w}(k - 1) +{ \mathbf{S}}_{D}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)[{\mathbf{w}}_{ o}(k - 1) -\mathbf{w}(k - 1)] +{ \mathbf{S}}_{D}(k)\mathbf{x}(k){e^{\prime}}_{o}(k)\end{array}$$
(5.76)

By taking the expected value of (5.76), considering that x(k) and e′ o (k) are approximately orthogonal, and that w(k − 1) is independent of x(k), then

$$\begin{array}{rcl} E\left [\mathbf{w}(k)\right ]& =& E\left [\mathbf{w}(k - 1)\right ] + E\left [{\mathbf{S}}_{D}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)\right ]\left \{{\mathbf{w}}_{ o}(k - 1) - E[\mathbf{w}(k - 1)]\right \}\end{array}$$
(5.77)

It is now needed to compute E[S D (k)x(k)x T(k)] in the case of nonstationary input signal. From (5.54) and (5.56), one can show that

$${ \mathbf{R}}_{D}(k) =\sum \limits_{l=0}^{k}{\lambda }^{k-l}\mathbf{R}(l) + \Delta \mathbf{R}(k)$$
(5.78)

since \(E[{\mathbf{R}}_{D}(k)] = \sum \limits_{l=0}^{k}{\lambda }^{k-l}\mathbf{R}(l)\). The matrix Δ R(k) is again considered a symmetric error matrix with zero-mean stochastic entries that are independent of the input signal.

If the environment is considered to be varying at a slower pace than the memory of the adaptive RLS algorithm, then

$${ \mathbf{R}}_{D}(k) \approx \frac{1} {1 - \lambda }\mathbf{R}(k) + \Delta \mathbf{R}(k)$$
(5.79)

Considering that \((1 - \lambda )\vert \vert {\mathbf{R}}^{-1}(k)\Delta \mathbf{R}(k)\vert \vert < 1\) and using the same procedure to deduce (5.58), we obtain

$${ \mathbf{S}}_{D}(k) \approx (1 - \lambda ){\mathbf{R}}^{-1}(k) - {(1 - \lambda )}^{2}{\mathbf{R}}^{-1}(k)\Delta \mathbf{R}(k){\mathbf{R}}^{-1}(k)$$
(5.80)

it then follows that

$$\begin{array}{rcl} E\left [\mathbf{w}(k)\right ]& =& E\left [\mathbf{w}(k - 1)\right ] + \left \{(1 - \lambda )E\left [{\mathbf{R}}^{-1}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)\right ]\right. \\ & & -\left.{(1 - \lambda )}^{2}E\left [{\mathbf{R}}^{-1}(k)\Delta \mathbf{R}(k){\mathbf{R}}^{-1}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)\right ]\right \}\left \{{\mathbf{w}}_{ o}(k - 1) - E\left [\mathbf{w}(k - 1)\right ]\right \} \\ & \approx & E\left [\mathbf{w}(k - 1)\right ] + (1 - \lambda )\left \{{\mathbf{w}}_{o}(k - 1) - E\left [\mathbf{w}(k - 1)\right ]\right \} \end{array}$$
(5.81)

where it was considered that Δ R(k) is independent of x(k) and has zero expected value.

Now defining the lag-error vector in the coefficients as

$${ \mathbf{l}}_{\mathbf{w}}(k) = E[\mathbf{w}(k)] -{\mathbf{w}}_{o}(k)$$
(5.82)

From (5.81), it can be concluded that

$${ \mathbf{l}}_{\mathbf{w}}(k) = \lambda {\mathbf{l}}_{\mathbf{w}}(k - 1) -{\mathbf{w}}_{o}(k) +{ \mathbf{w}}_{o}(k - 1)$$
(5.83)

Equation (5.83) is equivalent to say that the lag is generated by applying the optimal instantaneous value w o (k) through a first-order discrete-time filter as follows:

$${L}_{i}(z) = -\frac{z - 1} {z - \lambda }{W}_{oi}(z)$$
(5.84)

The discrete-time filter transient response converges with a time constant given by

$$\tau = \frac{1} {1 - \lambda }$$
(5.85)

The time constant is of course the same for each individual coefficient. Note that the tracking ability of the coefficients in the RLS algorithm is independent of the eigenvalues of the input signal correlation matrix.

The lag in the coefficients leads to an excess MSE. In order to calculate the MSE suppose that the optimal coefficient values are first-order Markov processes described by

$${ \mathbf{w}}_{o}(k) = {\lambda }_{\mathbf{w}}{\mathbf{w}}_{o}(k - 1) +{ \mathbf{n}}_{\mathbf{w}}(k)$$
(5.86)

where n w (k) is a vector whose elements are zero-mean white-noise processes with variance σ w 2, and λ w < 1. Note that λ < λ w < 1, since the optimal coefficients values must vary slower than the filter tracking speed, that means \(\frac{1} {1-\lambda } < \frac{1} {1-{\lambda }_{\mathbf{w}}}\).

The excess MSE due to lag is then given by (see the derivations around (3.41))

$$\begin{array}{rcl}{ \xi }_{\mathrm{lag}}& =& E\left [{\mathbf{l}}_{\mathbf{w}}^{T}(k)\mathbf{R}{\mathbf{l}}_{\mathbf{ w}}(k)\right ] \\ & =& E\left \{\mathrm{tr}\,\left [\mathbf{R}{\mathbf{l}}_{\mathbf{w}}(k){\mathbf{l}}_{\mathbf{w}}^{T}(k)\right ]\right \} \\ & =& \mathrm{tr}\,\left \{\mathbf{R}E\left [{\mathbf{l}}_{\mathbf{w}}(k){\mathbf{l}}_{\mathbf{w}}^{T}(k)\right ]\right \} \\ & =& \mathrm{tr}\left \{\mathbf{\Lambda }E\left [{\mathbf{l}}_{\mathbf{w}}^{\prime}(k){\mathbf{l}{}_{\mathbf{w}}^{\prime}}^{T}(k)\right ]\right \} \\ & =& \sum \limits_{i=0}^{N}{\lambda }_{ i}E\left [{l}_{i}^{^{\prime}2}(k)\right ]\end{array}$$
(5.87)

For λ w not close to one, it is a bit more complicated to deduce the excess MSE due to lag than for λ w ≈ 1. However, the effort is worth it because the resulting expression is more accurate. From (5.84), we can see that the lag-error vector elements are generated by applying a first-order discrete-time system to the elements of the unknown system coefficient vector. On the other hand, the coefficients of the unknown system are generated by applying each element of the noise vector n w (k) to a first-order all-pole filter, with the pole placed at λ w . For the unknown coefficient vector with the above model, the lag-error vector elements can be generated by applying the elements of the noise vector n w (k) to a discrete-time filter with transfer function

$$H(z) ={ -(z - 1)z \over (z - \lambda )(z - {\lambda }_{\mathbf{w}})}$$
(5.88)

This transfer function consists of a cascade of the lag filter with the all-pole filter representing the first-order Markov process. The solution for the variance of the lag terms l i can be computed through the inverse \(\mathcal{Z}\)-transform as follows:

$$\begin{array}{rcl} E[{l}_{i}^{^{\prime}2}(k)] ={ 1 \over 2\pi \mathrm{J}} \oint \nolimits H(z)H({z}^{-1}){\sigma }_{\mathbf{ w}}^{2}{z}^{-1}\ dz& &\end{array}$$
(5.89)

The above integral can be solved using the residue theorem as previously shown in the LMS algorithm case.

Using the solution for the variance of the lag terms of (5.89) for values of λ w < 1, and substituting the result in the last term of (5.87) it can be shown that

$$\begin{array}{rcl}{ \xi }_{\mathrm{lag}}& \approx & \frac{\mathrm{tr}\,[\mathbf{R}]{\sigma }_{\mathbf{w}}^{2}} {{\lambda }_{\mathbf{w}}(1 + {\lambda }^{2}) - \lambda (1 + {\lambda }_{\mathbf{w}}^{2})}\left (\frac{1 - \lambda } {1 + \lambda } -\frac{1 - {\lambda }_{\mathbf{w}}} {1 + {\lambda }_{\mathbf{w}}}\right ) \\ & =& \frac{(N + 1){\sigma }_{\mathbf{w}}^{2}{\sigma }_{x}^{2}} {{\lambda }_{\mathbf{w}}(1 + {\lambda }^{2}) - \lambda (1 + {\lambda }_{\mathbf{w}}^{2})}\left (\frac{1 - \lambda } {1 + \lambda } -\frac{1 - {\lambda }_{\mathbf{w}}} {1 + {\lambda }_{\mathbf{w}}}\right )\end{array}$$
(5.90)

where it was used the fact that \(\mathrm{tr}[\mathbf{R}] =\sum \limits_{i=0}^{N}{\lambda }_{i} = (N + 1){\sigma }_{x}^{2}\), for a tap delay line. It should be noticed that assumptions such as the correlation matrix R being diagonal and the input signal being white noise were not required in this derivation.

If λ = 1 and λ w ≈ 1, the MSE due to lag tends to infinity indicating that the RLS algorithm in this case cannot track any change in the environment. On the other hand, for λ < 1 the algorithm can track variations in the environment, leading to an excess MSE that depends on the variance of the optimal coefficient disturbance and on the input signal variance.

For λ w = 1 and λ ≈ 1, it is possible to rewrite (5.90) as

$${\xi }_{\mathrm{lag}} \approx (N + 1) \frac{{\sigma }_{\mathbf{w}}^{2}} {2(1 - \lambda )}{\sigma }_{x}^{2}$$
(5.91)

The total excess MSE accounting for the lag and finite memory is given by

$${\xi }_{\mathrm{total}} \approx (N + 1)\left [\frac{1 - \lambda } {1 + \lambda }{\xi }_{\mathrm{min}} + \frac{{\sigma }_{\mathbf{w}}^{2}{\sigma }_{x}^{2}} {2(1 - \lambda )}\right ]$$
(5.92)

By differentiating the above equation with respect to λ and setting the result to zero, an optimum value for λ can be found that yields minimum excess MSE.

$${\lambda }_{\mathrm{opt}} = \frac{1 -\frac{{\sigma }_{\mathbf{w}}{\sigma }_{x}} {2{\sigma }_{n}} } {1 + \frac{{\sigma }_{\mathbf{w}}{\sigma }_{x}} {2{\sigma }_{n}} }$$
(5.93)

In the above equation we used \({\sigma }_{n} = \sqrt{{\xi }_{\mathrm{min }}}\). Note that the optimal value of λ does not depend on the adaptive-filter order N, and can be used when it falls in an acceptable range of values for λ. Also, this value is optimum only when quantization effects are not important and the first-order Markov model (with λ w ≈ 1) is a good approximation for the nonstationarity of the desired signal.

When implemented with finite-precision arithmetic, the conventional RLS algorithm behavior can differ significantly from what is expected under infinite precision. A series of inconvenient effects can show up in the practical implementation of the conventional RLS algorithm, such as divergence and freezing in the updating of the adaptive-filter coefficients. Chapter 16 presents a detailed analysis of the finite-wordlength effects in the RLS algorithm.

5.5 Complex RLS Algorithm

In the complex data case the RLS objective function is given by

$$\begin{array}{rcl}{ \xi }^{d}(k)& =& \sum \limits_{i=0}^{k}{\lambda }^{k-i}\vert \epsilon (i){\vert }^{2} =\sum \limits_{i=0}^{k}{\lambda }^{k-i}\vert d(i) -{\mathbf{w}}^{H}(i)\mathbf{x}(k){\vert }^{2} \\ & =& \sum \limits_{i=0}^{k}{\lambda }^{k-i}\left [d(i) -{\mathbf{w}}^{H}(i)\mathbf{x}(k)\right ]\left [{d}^{{_\ast}}(i) -{\mathbf{w}}^{T}(i){\mathbf{x}}^{{_\ast}}(k)\right ]\end{array}$$
(5.94)

Differentiating ξd(k) with respect to the complex coefficient w (k) leads toFootnote 3

$$\frac{\partial {\xi }^{d}(k)} {\partial {\mathbf{w}}^{{_\ast}}(k)} = -\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i)[{d}^{{_\ast}}(i) -{\mathbf{w}}^{T}(i){\mathbf{x}}^{{_\ast}}(k)]$$
(5.95)

The optimal vector w(k) that minimizes the least-squares error is computed by equating the above equation to zero that is

$$\begin{array}{rcl} -\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i){\mathbf{x}}^{H}(i)\mathbf{w}(k) +\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i){d}^{{_\ast}}(i) = \left [\begin{array}{c} 0\\ 0\\ \vdots \\ 0 \end{array} \right ]& & \\ \end{array}$$

leading to the following expression

$$\begin{array}{rcl} \mathbf{w}(k)& =&{ \left [\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i){\mathbf{x}}^{H}(i)\right ]}^{-1}\sum \limits_{i=0}^{k}{\lambda }^{k-i}\mathbf{x}(i){d}^{{_\ast}}(i) \\ & =&{ \mathbf{R}}_{D}^{-1}(k){\mathbf{p}}_{ D}(k) \end{array}$$
(5.96)

The matrix inversion lemma to the case of complex data is given by

$${ \mathbf{S}}_{D}(k) ={ \mathbf{R}}_{D}^{-1}(k) = \frac{1} {\lambda }\left [{\mathbf{S}}_{D}(k - 1) -\frac{{\mathbf{S}}_{D}(k - 1)\mathbf{x}(k){\mathbf{x}}^{H}(k){\mathbf{S}}_{D}(k - 1)} {\lambda +{ \mathbf{x}}^{H}(k){\mathbf{S}}_{D}(k - 1)\mathbf{x}(k)} \right ]$$
(5.97)

The complete conventional RLS algorithm is described in Algorithm 5.3.

An alternative complex RLS algorithm has an updating equation described by

$$\mathbf{w}(k) = \mathbf{w}(k - 1) + {e}^{{_\ast}}(k){\mathbf{S}}_{ D}(k)\mathbf{x}(k)$$
(5.98)

where

$$e(k) = d(k) -{\mathbf{w}}^{H}(k - 1)\mathbf{x}(k)$$
(5.99)

With (5.98), it is straightforward to generate an alternative conventional RLS algorithm as shown in Algorithm 5.4.

5.6 Examples

In this section, some examples illustrating the performance of the conventional RLS algorithm are discussed.

5.6.1 Analytical Example

Example 5.3.

Assume that an adaptive filter of sufficient order is employed to identify an unknown system of order N, and produces a misadjustment of 10%. Assume the input signal is a white Gausssian noise with unit variance and σ n 2 = 0. 001.

  1. (a)

    Compute the value of λ required by the RLS algorithm in order to achieve the desired result when N = 9.

  2. (b)

    For values in the range 0. 9 < λ < 0. 99, which orders should the adaptive filters have?

Solution.

  1. (a)

    The desired misadjustment expression as per (5.73) is

    $$\begin{array}{rcl} M = 0.1 = (N + 1)\frac{1 - \lambda } {1 + \lambda }\left (1 + \frac{1 - \lambda } {1 + \lambda }\mathcal{K}\right ) = 10a(1 + 2a)& & \\ \end{array}$$

    where \(a = \frac{1-\lambda } {1+\lambda }\) and \(\mathcal{K} = 2\). By solving this equation we obtain

    $$\begin{array}{rcl} a = \frac{-\frac{1} {2} \pm \sqrt{\frac{1} {4} + 0.02}} {2} & & \\ \end{array}$$

    where the valid solution is

    $$\begin{array}{rcl} a = \frac{1} {4}\left (-1 + \sqrt{1 + 0.08}\right ) = 0.0098076& & \\ \end{array}$$

    then solving for λ

    $$\begin{array}{rcl} \lambda = \frac{1 - a} {1 + a} = 0.980507& & \\ \end{array}$$

    By employing the simplest expression of (5.72) we obtain

    $$\begin{array}{rcl} \lambda = \frac{1 - \frac{M} {(N+1)}} {1 + \frac{M} {(N+1)}} = \frac{1 - 1{0}^{-2}} {1 + 1{0}^{-2}} = 0.98& & \\ \end{array}$$

    where M is the misadjustment.

  2. (b)

    Since from (5.73)

    $$\begin{array}{rcl} \frac{1} {N + 1} = \frac{1} {M} \frac{1 - \lambda } {1 + \lambda }\left (1 + \frac{1 - \lambda } {1 + \lambda }\mathcal{K}\right ) = 10a(1 + 2a)& & \\ \end{array}$$

    for λ = 0. 90, a = 0. 052631578

    $$\begin{array}{rcl} \frac{1} {N + 1} = 0.5817& & \\ \end{array}$$

    so that N = 0. 7190 and as a result only one coefficient can be employed in the adaptive filter. For λ = 0. 99, a = 0. 005025125,

    $$\begin{array}{rcl} \frac{1} {N + 1} = 0.05075& & \\ \end{array}$$

    so that N = 18. 7 and as a result 19 coefficients can be employed in the adaptive filter.

    Using the simplest expression for M, derived from (5.72), the results are almost the same, since

    $$\begin{array}{rcl} N = M \frac{1 + \lambda } {1 - \lambda } - 1& & \\ \end{array}$$

    for λ = 0. 90, N = 0. 9 meaning that only an adaptive filter with one coefficient would be able to achieve the desired misadjustment for this value of λ. For λ = 0. 99, N = 18. 9 meaning that adaptive filters up to order 18 would be able to achieve the desired misadjustment for this value of λ.

5.6.2 System Identification Simulations

In the following subsections, some adaptive-filtering problems described in the last two chapters are solved using the conventional RLS algorithm presented in this chapter.

Example 5.4.

The conventional RLS algorithm is employed in the identification of the system described in the Sect. 3.6.2. The forgetting factor is chosen as λ = 0. 99.

Solution.

In the first test, we address the sensitivity of the RLS algorithm to the eigenvalue spread of the input signal correlation matrix. The measured simulation results are obtained by ensemble averaging 200 independent runs. The learning curves of the mean-squared a priori error are depicted in Fig. 5.4, for different values of the eigenvalue spread. Also, the measured misadjustment in each example is given in Table 5.1. From these results, we conclude that the RLS algorithm is insensitive to the eigenvalue spread. It is worth mentioning at this point that the convergence speed of the RLS algorithm is affected by the choice of λ, since a smaller value of λ leads to faster convergence while increasing the misadjustment in stationary environment. Table 5.1 shows the misadjustment predicted by theory, calculated using the relation repeated below. As can be seen from this table the analytical results agree with those obtained through simulations.

$$\begin{array}{rcl} M = (N + 1)\frac{1 - \lambda } {1 + \lambda }\left (1 + \frac{1 - \lambda } {1 + \lambda }\mathcal{K}\right )& & \\ \end{array}$$
Fig. 5.4
figure 4

Learning curves for RLS algorithm for eigenvalue spreads: 1, 20, and 80; λ = 0. 99

Table 5.1 Evaluation of the RLS algorithm

The conventional RLS algorithm is implemented with finite-precision arithmetic, using fixed-point representation with 16, 12, and 10 bits, respectively. The results presented are measured before any sign of instability is noticed. Table 5.2 summarizes the results of the finite-precision implementation of the conventional RLS algorithm. Note that in most cases there is a close agreement between the measurement results and those predicted by the equations given below. These equations correspond to (16.37) and (16.48) derived in Chap. 16.

$$\begin{array}{rcl} E[\vert \vert \Delta \mathbf{w}{(k)}_{Q}\vert {\vert }^{2}]& & \approx \frac{(1 - \lambda )(N + 1)} {2\lambda } \,\frac{{\sigma }_{n}^{2} + {\sigma }_{e}^{2}} {{\sigma }_{x}^{2}} + \frac{(N + 1){\sigma }_{\mathbf{w}}^{2}} {2\lambda (1 - \lambda )} \\ \xi {(k)}_{Q}& & \approx {\xi }_{\mathrm{min}} + {\sigma }_{e}^{2} + \frac{(N + 1){\sigma }_{\mathbf{w}}^{2}{\sigma }_{ x}^{2}} {2\lambda (1 - \lambda )} \\ \end{array}$$

For the simulations with 12 and 10 bits, the discrepancy between the measured and theoretical estimates of E[ | | Δ w(k) Q | | 2] are caused by the freezing of some coefficients.

Table 5.2 Results of the finite-precision implementation of the RLS algorithm

If the results presented here are compared with the results presented in Table 3.2 for the LMS, we notice that both the LMS and the RLS algorithms performed well in the finite-precision implementation. The reader should bear in mind that the conventional RLS algorithm requires an expensive strategy to keep the deterministic correlation matrix positive definite, as discussed in Chap. 16.

The simulations related to the experiment described for nonstationary environments are also performed. From the simulations we measure the total excess MSE, and then compare the results to those obtained with the expression below.

$$\begin{array}{rcl}{ \xi }_{\mathrm{exc}}& \approx & (N + 1)\frac{1 - \lambda } {1 + \lambda }\left (1 + \frac{1 - \lambda } {1 + \lambda }\mathcal{K}\right ){\xi }_{\min } + \frac{(N + 1){\sigma }_{\mathbf{w}}^{2}{\sigma }_{x}^{2}} {{\lambda }_{\mathbf{w}}(1 + {\lambda }^{2}) - \lambda (1 + {\lambda }_{\mathbf{w}}^{2})}\left (\frac{1 - \lambda } {1 + \lambda } -\frac{1 - {\lambda }_{\mathbf{w}}} {1 + {\lambda }_{\mathbf{w}}}\right )\\ \end{array}$$

An attempt to use the optimal value of λ is made. The predicted optimal value, in this case, is too small and as a consequence λ = 0. 99 is used. The measured excess MSE is 0. 0254, whereas the theoretical value predicted by the above equation is 0. 0418. Note that the theoretical result is not as accurate as all the previous cases discussed so far, due to a number of approximations used in the analysis. However, the above equation provides a good indication of what is expected in the practical implementation. By choosing a smaller value for λ a better tracking performance is obtained, a situation where the above equation is not as accurate. □

5.6.3 Signal Enhancement Simulations

Example 5.5.

We solved the same signal enhancement problem described in the Sect. 4.7.3 with the conventional RLS and LMS algorithms.

Solution.

For the LMS algorithm, the convergence factor is chosen μmax ∕ 5. The resulting value for μ in the LMS case is 0. 001, whereas λ = 1. 0 is used for the RLS algorithm. The learning curves for the algorithms are shown in Fig. 5.5, where we can verify the faster convergence of the RLS algorithm. By plotting the output errors after convergence, we noted the large variance of the MSE for both algorithms. This result is due to the small signal-to-noise ratio, in this case. Figure 5.6 depicts the output error and its DFT with 128 points for the RLS algorithm. In both cases, we can clearly detect the presence of the sinusoid.

Fig. 5.5
figure 5

Learning curves for the (a) LMS and (b) RLS algorithms

Fig. 5.6
figure 6

(a) Output error for the RLS algorithm and (b) DFT of the output error

5.7 Concluding Remarks

In this chapter, we introduced the conventional RLS algorithm and discussed various aspects related to its performance behavior. Much of the results obtained herein through mathematical analysis are valid for the whole class of RLS algorithms to be presented in the following chapters, except for the finite-precision analysis since that depends on the form the internal calculations of each algorithm are performed. The analysis presented here is far from being complete. However, the main aspects of the conventional RLS have been addressed, such as: convergence behavior and tracking capabilities. The interested reader should consult [79] for some further results. Chapter 16 complements this chapter by addressing the finite-precision analysis of the conventional RLS algorithm.

From the analysis presented, one can conclude that the computational complexity and the stability in finite-precision implementations are two aspects to be concerned. When the elements of the input signal vector consist of delayed versions of the same signal, it is possible to derive a number of fast RLS algorithms whose computational complexity is of order N per output sample. Several different classes of these algorithms are presented in the following chapters. In all cases, their stability conditions in finite-precision implementation are briefly discussed.

For the general case where the elements of the input signal vector have different origins the QR-RLS algorithm is a good alternative to the conventional RLS algorithm. The stability of the QR-RLS algorithm can be easily guaranteed.

The conventional RLS algorithm is fully tested in a number of simulation results included in this chapter. These examples were meant to verify the theoretical results discussed in the present chapter and to compare the RLS algorithm with the LMS algorithm.

The LMS algorithm is usually referred to as stochastic gradient algorithm originated from the stochastic formulation of the Wiener filter which in turn deals with stationary noises and signals. The RLS algorithm is derived from a deterministic formulation meant to achieve weighted least-squares error minimization in a sequential recursive format. A widely known generalization of the Wiener filter is the Kalman filter which deals with nonstationary noises and signals utilizing a stochastic formulation. However, it is possible to show that the discrete-time version of the Kalman filtering algorithm can be considered to be a generalization of the RLS algorithm. In Chap. 17 we present a brief description of Kalman filters as well as its relationship with the RLS algorithm.

5.8 Problems

  1. 1.

    The RLS algorithm is used to predict the signal \(x(k) =\cos \frac{\pi k} {3}\) using a second-order FIR filter with the first tap fixed at 1. Given λ = 0. 98, calculate the output signal y(k) and the tap coefficients for the first ten iterations. Note that we aim the minimization of E[y 2(k)].

    Start with \({\mathbf{w}}^{T}(-1) = [1\ 0\ 0]\) and δ = 100.

  2. 2.

    Show that the solution in (5.4) is a minimum point.

  3. 3.

    Show that S D (k) approaches a null matrix for large k, when λ = 1.

  4. 4.

    Suppose that the measurement noise n(k) is a random signal with zero-mean and the probability density with normal distribution. In a sufficient-order identification of an FIR system with optimal coefficients given by w o , show that the least-squares solution with λ = 1 is also normally distributed with mean w o and covariance E[S D (k n 2].

  5. 5.

    Prove that (5.42) is valid. What is the result when n(k) has zero mean and is correlated to the input signal x(k)?

    Hint: You can use the relation \(E[{e}^{2}(k)] = E{[e(k)]}^{2} + {\sigma }^{2}[e(k)]\), where σ2[ ⋅] means variance of [ ⋅].

  6. 6.

    Consider that the additive noise n(k) is uncorrelated with the input and the desired signals and is also a nonwhite noise with autocorrelation matrix R n . Determine the transfer function of a prewhitening filter that applied to d′(k) + n(k) and x(k) generates the optimum least-squares solution \({\mathbf{w}}_{o} ={ \mathbf{R}}^{-1}\mathbf{p}\) for k.

  7. 7.

    Show that if the additive noise is uncorrelated with d′(k) and x(k), and nonwhite, the least-squares algorithm will converge asymptotically to the optimal solution.

  8. 8.

    In Problem 4, when n(k) is correlated to x(k), is w o still the optimal solution? If not, what is the optimal solution?

  9. 9.

    Show that in the RLS algorithm the following relation is true

    $$\begin{array}{rcl}{ \xi }^{d}(k) = \lambda {\xi }^{d}(k - 1) + \epsilon (k)e(k)& & \\ \end{array}$$

    where e(k) is the a priori error as defined in (5.8).

  10. 10.

    Prove the validity of the approximation in (5.80).

  11. 11.

    Demonstrate that the updating formula for the complex RLS algorithm is given by (5.98).

  12. 12.

    Show that for an input signal with diagonal dominant correlation matrix R the following approximation related to (16.28) and (16.32) is valid.

    $$\begin{array}{rcl} & & E\left \{{\mathbf{N}}_{{\mathbf{S}}_{D}}(k)\mathbf{x}(k){\mathbf{x}}^{T}(k)\mathrm{cov}\,\left [\Delta \mathbf{w}{(k - 1)}_{ Q}\right ]\mathbf{x}(k){\mathbf{x}}^{T}(k){\mathbf{N}}_{{\mathbf{ S}}_{D}}(k)\right \} \\ & & \qquad \approx {\sigma }_{{\mathbf{S}}_{D}}^{2}{\sigma }_{ x}^{4}\mathrm{tr}\left \{\mathrm{cov}\,\left [\Delta \mathbf{w}{(k - 1)}_{ Q}\right ]\right \}\mathbf{I} \\ \end{array}$$
  13. 13.

    Derive (16.35)–(16.37).

  14. 14.

    The conventional RLS algorithm is applied to identify a 7th-order time-varying unknown system whose coefficients are first-order Markov processes with λ w = 0. 999 and σ w 2 = 0. 033. The initial time-varying system multiplier coefficients are

    $${\mathbf{w}}_{o}^{T} = [0.03490\: - 0.01100\: - 0.06864\:0.22391\:0.55686\:0.35798\: - 0.02390\: - 0.07594]$$

    The input signal is Gaussian white noise with variance σ x 2 = 1 and the measurement noise is also Gaussian white noise independent of the input signal and of the elements of n w (k), with variance σ n 2 = 0. 01.

    1. (a)

      For λ = 0. 97, compute the excess MSE.

    2. (b)

      Repeat (a) for λ = λopt.

    3. (c)

      Simulate the experiment described, measure the excess MSE, and compare to the calculated results.

  15. 15.

    Reduce the value of λ w to 0.97 in Problem 14, simulate, and comment on the results.

  16. 16.

    Suppose a 15th-order FIR digital filter with multiplier coefficients given below is identified through an adaptive FIR filter of the same order using the conventional RLS algorithm. Consider that fixed-point arithmetic is used.

    $$\begin{array}{ll} \mbox{ Additional noise : white noise with variance} &{\sigma }_{n}^{2} = 0.0015 \\ \mbox{ Coefficient wordlength:} &\,{b}_{c} = 16\mbox{ bits} \\ \mbox{ Signal wordlength:} &{b}_{d} = 16\mbox{ bits} \\ \mbox{ Input signal: Gaussian white noise with variance}&{\sigma }_{x}^{2} = 0.7 \\ &\,\,\,\lambda = {\lambda }_{\mathrm{opt}}\end{array}$$

    \({\mathbf{w}}_{o}^{T}\,=\,[0.0219360\:\:\!\!\!\:\:0.0015786\:\:\!\!\!\!\!\:\:-\,0.0602449\:\!\!\!\!\!\:\:\:-\,0.0118907\:\:\!\!\:0.1375379\:\:\:\) \(0.0574545\:\:\: - 0.3216703\:\:\: - 0.5287203\:\:\: - 0.2957797\:\:\:0.0002043\:\:\:0.290670\:\:\:\) \(-\,0.0353349\:\:\:-\,0.0068210\:\:\:0.0026067\:\:\:0.0010333\:\:\: - 0.0143593]\)

    1. (a)

      Compute the expected value for | | Δ w(k) Q | | 2 and ξ(k) Q for the described case.

    2. (b)

      Simulate the identification example described and compare the simulated results with those obtained through the closed form formulas.

    3. (c)

      Plot the learning curves for the finite- and infinite-precision implementations. Also, plot E[ | | Δ w(k) | | 2] versus k in both cases.

  17. 17.

    Repeat the above problem for the following cases

    1. (a)

      \({\sigma }_{n}^{2} = 0.01,{b}_{c} = 9\) bits, b d = 9 bits, \({\sigma }_{x}^{2} = 0.7,\lambda = {\lambda }_{\mathrm{opt}}.\)

    2. (b)

      \({\sigma }_{n}^{2} = 0.1,{b}_{c} = 10\) bits, b d = 10 bits, \({\sigma }_{x}^{2} = 0.8,\lambda = {\lambda }_{\mathrm{opt}}\).

    3. (c)

      \({\sigma }_{n}^{2} = 0.05,{b}_{c} = 8\) bits, b d = 16 bits, \({\sigma }_{x}^{2} = 0.8,\lambda = {\lambda }_{\mathrm{opt}}\).

  18. 18.

    In Problem 17, compute (do not simulate) E[ | | Δ w(k) Q | | 2], ξ(k) Q , and the probable number of iterations before the algorithm stop updating for \(\lambda = 1,\lambda = 0.98,\lambda = 0.96\), and λ = λopt.

  19. 19.

    Repeat Problem 16 for the case where the input signal is a first-order Markov process with λ x = 0. 95.

  20. 20.

    A digital channel model can be represented by the following impulse response:

    $$\begin{array}{rcl} & & [-0.001\:\ - 0.002\:\ 0.002\:\ 0.2\:\ 0.6\:\ 0.76\:\ 0.9\:\ 0.78\:\ 0.67\:\ 0.58 \\ & & 0.45\:\ 0.3\:\ 0.2\:\ 0.12\:\ 0.06\:\ 0\:\ - 0.2\:\ - 1\:\ - 2\:\ - 1\:\ 0\:\ 0.1] \\ \end{array}$$

    The channel is corrupted by Gaussian noise with power spectrum given by

    $$\vert S({\mathrm{e}}^{\mathrm{J}\omega }){\vert }^{2} = \kappa ^{\prime}\vert \omega {\vert }^{3/2}$$

    where \(\kappa ^{\prime} = 1{0}^{-1.5}\). The training signal consists of independent binary samples ( − 1, 1).

    Design an FIR equalizer for this problem and use the RLS algorithm. Use a filter of order 50 and plot the learning curve.

  21. 21.

    For the previous problem, using the maximum of 51 adaptive-filter coefficients, implement a DFE equalizer and compare the results with those obtained with the FIR equalizer. Again use the RLS algorithm.

  22. 22.

    Use the complex RLS algorithm to equalize a channel with the transfer function given below. The input signal is a four QAM signal representing a randomly generated bit stream with the signal-to-noise ratio \(\frac{{\sigma }_{\tilde{x}}^{2}} {{\sigma }_{n}^{2}} = 20\) at the receiver end, that is, \(\tilde{x}(k)\) is the received signal without taking into consideration the additional channel noise. The adaptive filter has ten coefficients.

    $$H(z) = (0.34 - 0.27\mathrm{J}) + (0.87 + 0.43\mathrm{J}){z}^{-1} + (0.34 - 0.21\mathrm{J}){z}^{-2}$$
    1. (a)

      Use an appropriate value for λ in the range 0.95–0.99, run the algorithm and comment on the convergence behavior.

    2. (b)

      Plot the real versus imaginary parts of the received signal before and after equalization.

    3. (c)

      Increase the number of coefficients to 20 and repeat the experiment in (b).

  23. 23.

    In a system identification problem the input signal is generated from a four QAM of the form

    $$x(k) = {x}_{\mathrm{re}}(k) + \mathrm{J}{x}_{\mathrm{im}}(k)$$

    where x re(k) and x im(k) assume values ± 1 randomly generated. The unknown system is described by

    $$H(z) = 0.5 + 0.2\mathrm{J} + (-0.1 + 0.4\mathrm{J}){z}^{-1} + (0.2 - 0.4\mathrm{J}){z}^{-2} + (0.2 + 0.7\mathrm{J}){z}^{-3}$$

    The adaptive filter is also a third-order complex FIR filter, and the additional noise is zero-mean Gaussian white noise with variance σ n 2 = 0. 3. Using the complex RLS algorithm run an ensemble of 20 experiments, and plot the average learning curve.

  24. 24.

    Apply the Kalman filter to equalize the system

    $$H(z) = \frac{0.19z} {z - 0.9}$$

    when the additional noise is a uniformly distributed white noise with variance σ n 2 = 0. 1, and the input signal to the channel is a Gaussian noise with unit variance.

  25. 25.

    Assume for a sufficient-order system identification application with an acceptable misadjustment of about 20%. Consider that the input signal is a Gaussion white noise.

    1. (a)

      Calculate the appropriate value of λ required by the RLS algorithm in order to achieve this goal considering an unknown system with eight coefficients.

    2. (b)

      Calculate the value of μ for the affine projection algorithm with L = 3.

    3. (c)

      If the unknown system consisted of a first-order Markov process with σ n 2 = 4σ w 2 and with eight coefficients, what would be ξtotal considering κ w = 1 in (4.134) and λ w ≈ 1?