Recursive Least Squares Parameter Estimation for a Class of Output Nonlinear Systems Based on the Model Decomposition

Ding, Feng; Wang, Xuehai; Chen, Qijia; Xiao, Yongsong

doi:10.1007/s00034-015-0190-6

Recursive Least Squares Parameter Estimation for a Class of Output Nonlinear Systems Based on the Model Decomposition

Short Paper
Published: 17 November 2015

Volume 35, pages 3323–3338, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Recursive Least Squares Parameter Estimation for a Class of Output Nonlinear Systems Based on the Model Decomposition

Download PDF

Feng Ding¹,
Xuehai Wang¹,
Qijia Chen¹ &
…
Yongsong Xiao¹

2209 Accesses
112 Citations
Explore all metrics

Abstract

In this paper, we study the parameter estimation problem of a class of output nonlinear systems and propose a recursive least squares (RLS) algorithm for estimating the parameters of the nonlinear systems based on the model decomposition. The proposed algorithm has lower computational cost than the existing over-parameterization model-based RLS algorithm. The simulation results indicate that the proposed algorithm can effectively estimate the parameters of the nonlinear systems.

$\ell _1$-regularized recursive total least squares based sparse system identification for the error-in-variables

Article Open access 31 August 2016

Maximum Likelihood-Based Recursive Least-Squares Algorithm for Multivariable Systems with Colored Noises Using the Decomposition Technique

Article 27 July 2018

Decomposition-based least squares parameter estimation algorithm for input nonlinear systems using the key term separation technique

Article 11 November 2014

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Iterative methods and recursive methods have widely been used in system identification [7, 8, 15], system control [31, 33], signal processing [30] and multivariate pseudo-linear regressive analysis [37] and for solving matrix equations [6]. The parameter estimation methods have received much attention in system identification [28, 38, 39]. For example, Wang [35] gave a least squares-based recursive estimation algorithm and an iterative estimation algorithm for output error moving average systems using the data filtering technique. Dehghan et al. [5] studied the fourth-order variants of the Newton’s method without the second-order derivatives for solving nonlinear equations. Shi et al. [32] studied the output feedback stabilization of networked control systems with random delays modeled by the Markov chains. Li [25] developed a maximum likelihood estimation algorithm for Hammerstein CARARMA systems based on the Newton iteration. Wang and Zhang [43] proposed an improved least squares identification algorithm for multivariable Hammerstein systems.

The typical nonlinear systems include the Wiener systems [14], the Hammerstein systems [18], the Hammerstein–Wiener systems [3] and feedback nonlinear systems [1, 20]. A Wiener nonlinear model is composed of a linear dynamic block followed by a static nonlinear function and a Hammerstein model puts a nonlinear function before a linear dynamic block [2, 44]. Vörös [34] proposed the key term separation technique for identifying Hammerstein systems with multi-segment piecewise-linear characteristics. Recently, a decomposition-based Newton iterative identification method was proposed for a Hammerstein nonlinear FIR system with ARMA noise [12].

This paper considers the parameter identification problem of a special class of output nonlinear systems, whose output is the nonlinear function of the past outputs [7, 8]. For this class of nonlinear systems with colored noise, Wang et al. [36] gave a least squares-based and a gradient-based iterative identification algorithms; Hu et al. [21] proposed a recursive extended least squares parameter estimation algorithm using the over-parameterization model and a multi-innovation generalized extended stochastic gradient algorithm for nonlinear autoregressive moving average systems [22]; Bai presented an optimal two-stage identification algorithm for Hammerstein–Wiener nonlinear systems [4]. The least squares algorithms play a key role in the parameter estimation of linear systems [17, 19, 29]. This paper derives a new recursive least squares algorithm for output nonlinear systems using the hierarchical identification principle. The proposed method has less computational load and can be extended to study parameter estimation of dual-rate/multi-rate sampled systems [9, 10, 26].

This paper is organized as follows: Section 2 gives the representation of a class of nonlinear systems. Sections 3 and 4 derive a least squares algorithm and a model decomposition-based recursive least squares algorithm. Section 5 gives the computational efficiency of the proposed algorithm and the recursive extended least squares algorithm. Section 6 provides a numerical example to show the effectiveness of the proposed algorithm. Finally, some concluding remarks are offered in Sect. 7.

2 The System Description and its Identification Model

Let us define some notation. “$A=:X$”or “$X:=A$” stands for “A is defined as X”; $\mathbf{1}_n$ denotes an n-dimensional column vector whose elements are all 1; ${\varvec{I}}$ (${\varvec{I}}_n$) represents an identity matrix of appropriate sizes ($n\times n$); z denotes a unit forward shift operator with $zx(t)=x(t+1)$ and $z^{-1}x(t)=x(t-1)$. Define the polynomials in the unit backward shift operator $z^{-1}$:

$$\begin{aligned} A'(z):= & {} a'_1z^{-1}+a'_2z^{-2}+\cdots +a'_{n_a}z^{-n_a},\\ A(z):= & {} a_1z^{-1}+a_2z^{-2}+\cdots +a_{n_a}z^{-n_a},\\ B(z):= & {} b_1z^{-1}+b_2z^{-2}+\cdots +b_{n_b}z^{-n_b},\\ D(z):= & {} 1+d_1z^{-1}+d_2z^{-2}+\cdots +d_{n_d}z^{-n_d}, \end{aligned}$$

and the parameter vectors:

$$\begin{aligned} {\varvec{a}}:= & {} [a_1, a_2, \ldots , a_{n_a}]^{\tiny \text{ T }}\in {\mathbb R}^{n_a},\\ {\varvec{b}}:= & {} [b_1, b_2, \ldots , b_{n_b}]^{\tiny \text{ T }}\in {\mathbb R}^{n_b},\\ {\varvec{c}}:= & {} [c_1, c_2, \ldots , c_{n_c}]^{\tiny \text{ T }}\in {\mathbb R}^{n_c},\\ {\varvec{d}}:= & {} [d_1, d_2, \ldots , d_{n_d}]^{\tiny \text{ T }}\in {\mathbb R}^{n_d}. \end{aligned}$$

A Hammerstein system (i.e., an input nonlinear system) can be expressed as [11]

$$\begin{aligned} y(t)=A'(z)y(t)+B(z)f(u(t))+D(z)v(t), \end{aligned}$$

by extending the input nonlinearity to the output nonlinearity, we can obtain a special class of nonlinear systems [21, 36]:

$$\begin{aligned} y(t)=A(z)f(y(t))+B(z)u(t)+D(z)v(t), \end{aligned}$$

(1)

where u(t) and y(t) are the input and output of the system, respectively, and v(t) is white noise with zero mean and variance $\sigma ^2$.

For simplicity, assume that the nonlinear part is a linear combination of a known basis ${\varvec{f}}:=(f_1, f_2, \ldots , f_{n_c})$ with coefficients $(c_1, c_2, \ldots , c_{n_c})$:

$$\begin{aligned} \bar{y}(t):=f(y(t))=c_1f_1(y(t))+c_2f_2(y(t))+\cdots +c_{n_c}f_{n_c}(y(t))={\varvec{f}}(y(t)){\varvec{c}}. \end{aligned}$$

For the parameter identifiability, we must fix one of the coefficients $c_i$’s, or $\Vert {\varvec{c}}\Vert =1$ with $c_1>0$ [11].

Equation (1) can be rewritten as

$$\begin{aligned} y(t)= & {} \sum \limits _{i=1}^{n_a}{a_i}z^{-i}{f(y(t))} +\sum \limits _{i=1}^{n_b}{b_i}z^{-i}{u(t)}+\sum \limits _{i=1}^{n_d}d_iz^{-i}v(t)+v(t)\nonumber \\= & {} a_1f(y(t-1))+a_2f(y(t-2))+\cdots +a_{n_a}f(y(t-n_a))\nonumber \\&+\,b_1u(t-1)+b_2u(t-2)+\cdots +b_{n_b}u(t-n_b)\nonumber \\&+\,d_1v(t-1)+d_2v(t-2)+\cdots +d_{n_d}v(t-{n_d})+v(t). \end{aligned}$$

(2)

Define the information matrix ${\varvec{F}}(t)$, the input information vector ${\varvec{\varphi }}(t)$ and the noise information vector ${\varvec{\psi }}(t)$ as

$$\begin{aligned} {\varvec{F}}(t):= & {} \left[ \begin{array}{cccc}f_1(y(t-1)) &{} f_2(y(t-1)) &{} \ldots &{} f_{n_c}(y(t-1))\\ f_1(y(t-2)) &{} f_2(y(t-2)) &{} \ldots &{} f_{n_c}(y(t-2))\\ \vdots &{} \vdots &{} &{} \vdots \\ f_1(y(t-n_a)) &{} f_2(y(t-n_a)) &{} \ldots &{} f_{n_c}(y(t-n_a))\\ \end{array}\right] \in {\mathbb R}^{n_a\times n_c}, \end{aligned}$$

(3)

$$\begin{aligned} {\varvec{\varphi }}(t):= & {} [u(t-1), u(t-2), \ldots , u(t-n_b)]^{\tiny \text{ T }}\in {\mathbb R}^{n_b}, \end{aligned}$$

(4)

$$\begin{aligned} {\varvec{\psi }}(t):= & {} [v(t-1), v(t-2), \ldots , v(t-n_d)]^{\tiny \text{ T }}\in {\mathbb R}^{n_d}. \end{aligned}$$

(5)

Then, Eq. (2) can be written as

$$\begin{aligned} y(t)={\varvec{a}}^{\tiny \text{ T }}{\varvec{F}}(t){\varvec{c}}+{\varvec{\varphi }}^{\tiny \text{ T }}(t){\varvec{b}}+{\varvec{\psi }}^{\tiny \text{ T }}(t){\varvec{d}}+v(t). \end{aligned}$$

(6)

The objective of identification is to present new methods for estimating the unknown parameter vector ${\varvec{c}}$ for the nonlinear part and the unknown parameter vectors ${\varvec{a}}$, ${\varvec{b}}$ and ${\varvec{d}}$ for the linear subsystems from the measurement data $\{u(t), y(t):\ t=1, 2, 3,\ldots \}$.

3 The Least Squares Algorithm Based on the Model Decomposition

Let $\hat{{\varvec{\theta }}}(t):=\left[ \begin{array}{c} \hat{{\varvec{a}}}(t) \\ \hat{{\varvec{d}}}(t) \end{array} \right] $ and $\hat{{\varvec{\vartheta }}}(t):=\left[ \begin{array}{c} \hat{{\varvec{b}}}(t) \\ \hat{{\varvec{c}}}(t) \end{array} \right] $ denote the estimates of ${\varvec{\theta }}:=\left[ \begin{array}{c} {\varvec{a}} \\ {\varvec{d}} \end{array} \right] $ and ${\varvec{\vartheta }}:=\left[ \begin{array}{c} {\varvec{b}} \\ {\varvec{c}} \end{array} \right] $ at time t, respectively, and ${\varvec{\varTheta }}:=\left[ \begin{array}{c} {\varvec{\theta }} \\ {\varvec{\vartheta }} \end{array} \right] $. For the identification model in (6), using the hierarchical identification principle (the decomposition technique), define the quadratic cost functions:

$$\begin{aligned} J_1({\varvec{\theta }}):= & {} J({\varvec{a}}, \hat{{\varvec{b}}}(t-1), \hat{{\varvec{c}}}(t-1), {\varvec{d}})\\= & {} \sum \limits _{j=1}^t\left[ \begin{array}{c}y(j)-{\varvec{\varphi }}^{\tiny \text{ T }}(j) \hat{{\varvec{b}}}(t-1)-[\hat{{\varvec{c}}}^{\tiny \text{ T }}(t-1){\varvec{F}}^{\tiny \text{ T }}(j), {\varvec{\psi }}^{\tiny \text{ T }}(j)] {\varvec{\theta }}\end{array}\right] ^2,\\ J_2({\varvec{\vartheta }}):= & {} J(\hat{{\varvec{a}}}(t), {\varvec{b}}, {\varvec{c}}, \hat{{\varvec{d}}}(t))\\= & {} \sum \limits _{j=1}^t\left[ \begin{array}{c}y(j)-{\varvec{\psi }}^{\tiny \text{ T }}(j) \hat{{\varvec{d}}}(t)-[{\varvec{\varphi }}^{\tiny \text{ T }}(j), \hat{{\varvec{a}}}^{\tiny \text{ T }}(t){\varvec{F}}(j)] {\varvec{\vartheta }}\end{array}\right] ^2. \end{aligned}$$

Define the output vector ${\varvec{Y}}_t$ and the information matrices ${\varvec{\varPhi }}_t$, ${\varvec{\varPsi }}_t$, ${\varvec{\varOmega }}_t$ and ${\varvec{\varXi }}_t$ as

$$\begin{aligned} {\varvec{Y}}_t:= & {} [y(1), y(2), \ldots , y(t)]^{\tiny \text{ T }}\in {\mathbb R}^t, \end{aligned}$$

(7)

$$\begin{aligned} {\varvec{\varPhi }}_t:= & {} [{\varvec{\varphi }}(1),{\varvec{\varphi }}(2), \ldots , {\varvec{\varphi }}(t)]^{\tiny \text{ T }}\in {\mathbb R}^{t\times {n_b}}, \end{aligned}$$

(8)

$$\begin{aligned} {\varvec{\varPsi }}_t:= & {} [{\varvec{\psi }}(1), {\varvec{\psi }}(2), \ldots , {\varvec{\psi }}(t)]^{\tiny \text{ T }}\in {\mathbb R}^{t\times {n_d}}, \end{aligned}$$

(9)

$$\begin{aligned} \hat{{\varvec{\varOmega }}}_t:= & {} \left[ \begin{array}{cccc}{\varvec{F}}(1)\hat{{\varvec{c}}}(t-1) &{} {\varvec{F}}(2)\hat{{\varvec{c}}}(t-1) &{} \ldots &{} {\varvec{F}}(t)\hat{{\varvec{c}}}(t-1)\\ \hat{{\varvec{\psi }}}(1) &{} \hat{{\varvec{\psi }}}(2) &{} \ldots &{}\hat{{\varvec{\psi }}}(t)\\ \end{array}\right] ^{\tiny \text{ T }}\in {\mathbb R}^{t\times (n_a+n_d)}, \end{aligned}$$

(10)

$$\begin{aligned} \hat{{\varvec{\varXi }}}_t:= & {} \left[ \begin{array}{cccc}{\varvec{\varphi }}(1) &{} {\varvec{\varphi }}(2) &{} \ldots &{} {\varvec{\varphi }}(t)\\ {\varvec{F}}^{\tiny \text{ T }}(1)\hat{{\varvec{a}}}(t) &{} {\varvec{F}}^{\tiny \text{ T }}(2)\hat{{\varvec{a}}}(t) &{} \ldots &{} {\varvec{F}}^{\tiny \text{ T }}(t)\hat{{\varvec{a}}}(t)\\ \end{array}\right] ^{\tiny \text{ T }}\in {\mathbb R}^{t\times (n_b+n_c)}. \ \end{aligned}$$

(11)

Then, $J_1({\varvec{\theta }})$ and $J_2({\varvec{\vartheta }})$ can be equivalently written as

$$\begin{aligned} J_1({\varvec{\theta }})= & {} \Vert {\varvec{Y}}_t-{\varvec{\varPhi }}_t\hat{{\varvec{b}}}(t-1)-\hat{{\varvec{\varOmega }}}_t{\varvec{\theta }}\Vert ^2,\\ J_2({\varvec{\vartheta }})= & {} \Vert {\varvec{Y}}_t-{\varvec{\varPsi }}_t\hat{{\varvec{d}}}(t)-\hat{{\varvec{\varXi }}}_t{\varvec{\vartheta }}\Vert ^2. \end{aligned}$$

For the two optimization problems, letting the partial derivatives of $J_1({\varvec{\theta }})$ and $J_2({\varvec{\vartheta }})$ with respect to ${\varvec{\theta }}$ and ${\varvec{\vartheta }}$ be zero gives

$$\begin{aligned} \left. \frac{\partial J_1({\varvec{\theta }})}{\partial {\varvec{\theta }}}\right| _{{{\varvec{\theta }}}=\hat{{\varvec{\theta }}}(t)}= & {} -2\hat{{\varvec{\varOmega }}}^{\tiny \text{ T }}_t[{\varvec{Y}}_t-{\varvec{\varPhi }}_t\hat{{\varvec{b}}}(t-1)-\hat{{\varvec{\varOmega }}}_t\hat{{\varvec{\theta }}}(t)] =\mathbf{0}, \end{aligned}$$

(12)

$$\begin{aligned} \left. \frac{\partial J_2({\varvec{\vartheta }})}{\partial {\varvec{\vartheta }}}\right| _{{{\varvec{\vartheta }}}=\hat{{\varvec{\vartheta }}}(t)}= & {} -2\hat{{\varvec{\varXi }}}^{\tiny \text{ T }}_t[{\varvec{Y}}_t-{\varvec{\varPsi }}_t\hat{{\varvec{d}}}(t)-\hat{{\varvec{\varXi }}}_t\hat{{\varvec{\vartheta }}}(t)] =\mathbf{0}. \end{aligned}$$

(13)

or

$$\begin{aligned} \hat{{\varvec{\varOmega }}}^{\tiny \text{ T }}_t\hat{{\varvec{\varOmega }}}_t\hat{{\varvec{\theta }}}(t)= & {} \hat{{\varvec{\varOmega }}}^{\tiny \text{ T }}_t[{\varvec{Y}}_t-{\varvec{\varPhi }}_t\hat{{\varvec{b}}}(t-1)],\\ \hat{{\varvec{\varXi }}}_t^{\tiny \text{ T }}\hat{{\varvec{\varXi }}}_t\hat{{\varvec{\vartheta }}}(t)= & {} \hat{{\varvec{\varXi }}}^{\tiny \text{ T }}_t[{\varvec{Y}}_t-{\varvec{\varPsi }}_t\hat{{\varvec{d}}}(t)]. \end{aligned}$$

In order to ensure that the inverses of the matrices $\hat{{\varvec{\varOmega }}}^{\tiny \text{ T }}_t\hat{{\varvec{\varOmega }}}_t$ and $\hat{{\varvec{\varXi }}}_t^{\tiny \text{ T }}\hat{{\varvec{\varXi }}}_t$ exist, we suppose that the information matrices $\hat{{\varvec{\varOmega }}}_t$ and $\hat{{\varvec{\varXi }}}_t$ are persistently exciting. Let $\hat{{\varvec{\varPsi }}}_t$, $\hat{{\varvec{\psi }}}(t)$ $\hat{v}(t)$ be the estimates of ${\varvec{\varPsi }}_t$, ${\varvec{\psi }}(t)$ and v(t) at time t.

Replacing the unknown ${\varvec{\varPsi }}_t$ in (12)–(13) with $\hat{{\varvec{\varPsi }}}_t$, we have the following least squares algorithm for estimating the parameter vectors ${\varvec{\theta }}$ and ${\varvec{\vartheta }}$:

$$\begin{aligned} \hat{{\varvec{\theta }}}(t)= & {} [\hat{{\varvec{\varOmega }}}^{\tiny \text{ T }}_t\hat{{\varvec{\varOmega }}}_t]^{-1}\hat{{\varvec{\varOmega }}}^{\tiny \text{ T }}_t[{\varvec{Y}}_t-{\varvec{\varPhi }}_t\hat{{\varvec{b}}}(t-1)], \end{aligned}$$

(14)

$$\begin{aligned} \hat{{\varvec{\vartheta }}}(t)= & {} [\hat{{\varvec{\varXi }}}_t^{\tiny \text{ T }}\hat{{\varvec{\varXi }}}_t]^{-1}\hat{{\varvec{\varXi }}}_t[{\varvec{Y}}_t-\hat{{\varvec{\varPsi }}}_t\hat{{\varvec{d}}}(t)], \end{aligned}$$

(15)

$$\begin{aligned} \hat{{\varvec{\varPsi }}}_t= & {} [\hat{{\varvec{\psi }}}(1), \hat{{\varvec{\psi }}}(2), \ldots , \hat{{\varvec{\psi }}}(t)]^{\tiny \text{ T }}, \end{aligned}$$

(16)

$$\begin{aligned} \hat{{\varvec{\psi }}}(t)= & {} [\hat{v}(t-1), \hat{v}(t-2), \ldots , \hat{v}(t-n_d)]^{\tiny \text{ T }}, \end{aligned}$$

(17)

$$\begin{aligned} \hat{v}(t)= & {} y(t)-\hat{{\varvec{a}}}^{\tiny \text{ T }}(t){\varvec{F}}(t)\hat{{\varvec{c}}}(t)-\hat{{\varvec{\varphi }}}^{\tiny \text{ T }}(t)\hat{{\varvec{b}}}(t)-\hat{{\varvec{\psi }}}^{\tiny \text{ T }}(t)\hat{{\varvec{d}}}(t). \end{aligned}$$

(18)

The procedures of computing the parameter estimates $\hat{{\varvec{\theta }}}(t)$ and $\hat{{\varvec{\vartheta }}}(t)$ are listed in the following.

1.
To initialize: let $t=p$, collect the input–output data $\{u(i),y(i): i=0, 1, 2, \ldots , p-1\}$ ($p\gg n_a+n_b+n_c+n_d$) and set the initial values $\hat{{\varvec{b}}}(p-1)=\mathbf{1}_{n_b}/{p_0}$, $\hat{{\varvec{c}}}(p-1)=$ a random vector with $\Vert \hat{{\varvec{c}}}(p-1)\Vert =1$, $\hat{v}(i)=$ a random number. $p_0$ is normally a large positive number (e.g., $p_0=10^6$), give the basis function $f_j(*)$.
2.
Collect the input–output data u(t) and y(t), form ${\varvec{Y}}_t$ using (7), ${\varvec{F}}(t)$ using (3), ${\varvec{\varphi }}(t)$ using (4), ${\varvec{\varPhi }}_t$ using (8) and $\hat{{\varvec{\psi }}}(t)$ using (17).
3.
Compute $\hat{{\varvec{\varOmega }}}_t$ using (10).
4.
Update the parameter estimate $\hat{{\varvec{\theta }}}(t)$ using (14) and read $\hat{{\varvec{a}}}(t)$ and $\hat{{\varvec{d}}}(t)$ from $\hat{{\varvec{\theta }}}(t)=\left[ \begin{array}{c} \hat{{\varvec{a}}}(t) \\ \hat{{\varvec{d}}}(t) \end{array} \right] $.
5.
Compute $\hat{{\varvec{\varXi }}}_t$ using (11) and form $\hat{{\varvec{\varPsi }}}_t$ using (16).
6.
Update the parameter estimate $\hat{{\varvec{\vartheta }}}(t)$ using (15), read $\hat{{\varvec{b}}}(t)$ from $\hat{{\varvec{\vartheta }}}(t)=\left[ \begin{array}{c} \hat{{\varvec{b}}}(t) \\ \hat{{\varvec{c}}}(t) \end{array} \right] $ and normalize $\hat{{\varvec{c}}}(t)$ using
$$\begin{aligned} \hat{{\varvec{c}}}(t)=\mathrm{sgn}\{[\hat{{\varvec{\vartheta }}}(t)](n_b+1)\}\frac{[\hat{{\varvec{\vartheta }}}(t)](n_b+1:n_b+n_c)}{\Vert [\hat{{\varvec{\vartheta }}}(t)](n_b+1:n_b+n_c)\Vert }. \end{aligned}$$
7.
Compute $\hat{v}(t)$ using (18).
8.
Increase t by 1, go to Step 2 and continue calculation.

4 The Recursive Least Squares Algorithm Based on the Model Decomposition

For the identification model in (6), define the quadratic cost functions:

$$\begin{aligned} J_3({\varvec{\theta }}):= & {} \sum \limits _{j=1}^t\left[ \begin{array}{c}y(j)-{\varvec{\varphi }}^{\tiny \text{ T }}(j){\varvec{b}}-[{\varvec{c}}^{\tiny \text{ T }}{\varvec{F}}^{\tiny \text{ T }}(j), {\varvec{\psi }}^{\tiny \text{ T }}(j)]{\varvec{\theta }}\end{array}\right] ^2,\\ J_4({\varvec{\vartheta }}):= & {} \sum \limits _{j=1}^t\left[ \begin{array}{c}y(j)-{\varvec{\psi }}^{\tiny \text{ T }}(j){\varvec{d}}-[{\varvec{\varphi }}^{\tiny \text{ T }}(j), {\varvec{a}}^{\tiny \text{ T }}{\varvec{F}}(j)]{\varvec{\vartheta }}\end{array}\right] ^2. \end{aligned}$$

Define the information matrices ${\varvec{\varphi }}_1(t)$, ${\varvec{\varOmega }}_t$ and ${\varvec{\varXi }}_t$ as

$$\begin{aligned} {\varvec{\varphi }}_1(t):= & {} [{\varvec{c}}^{\tiny \text{ T }}{\varvec{F}}^{\tiny \text{ T }}(t), {\varvec{\psi }}^{\tiny \text{ T }}(t)]^{\tiny \text{ T }}\in {\mathbb R}^{n_a+n_d}, \end{aligned}$$

(19)

$$\begin{aligned} {\varvec{\varOmega }}_t:= & {} \left[ \begin{array}{cccc}{\varvec{F}}(1){\varvec{c}}&{} {\varvec{F}}(2){\varvec{c}}&{} \ldots &{} {\varvec{F}}(t){\varvec{c}}\\ {\varvec{\psi }}(1) &{} {\varvec{\psi }}(2) &{} \ldots &{} {\varvec{\psi }}(t)\\ \end{array}\right] ^{\tiny \text{ T }}\in {\mathbb R}^{t\times (n_a+n_d)}, \end{aligned}$$

(20)

$$\begin{aligned} {\varvec{\varXi }}_t:= & {} \left[ \begin{array}{cccc}{\varvec{\varphi }}(1) &{} {\varvec{\varphi }}(2) &{} \ldots &{} {\varvec{\varphi }}(t)\\ {\varvec{F}}^{\tiny \text{ T }}(1){\varvec{a}}&{} {\varvec{F}}^{\tiny \text{ T }}(2){\varvec{a}}&{} \ldots &{} {\varvec{F}}^{\tiny \text{ T }}(t){\varvec{a}}\\ \end{array}\right] ^{\tiny \text{ T }}\in {\mathbb R}^{t\times (n_b+n_c)}. \ \end{aligned}$$

(21)

Then, $J_3({\varvec{\theta }})$ and $J_4({\varvec{\vartheta }})$ can be equivalently rewritten as

$$\begin{aligned} J_3({\varvec{\theta }})= & {} \Vert {\varvec{Y}}_t-{\varvec{\varPhi }}_t{\varvec{b}}-{\varvec{\varOmega }}_t{\varvec{\theta }}\Vert ^2,\\ J_4({\varvec{\vartheta }})= & {} \Vert {\varvec{Y}}_t-{\varvec{\varPsi }}_t{\varvec{d}}-{\varvec{\varXi }}_t{\varvec{\vartheta }}\Vert ^2. \end{aligned}$$

Similarly, minimizing $J_3({\varvec{\theta }})$ and $J_4({\varvec{\vartheta }})$, we can obtain the least squares estimates:

$$\begin{aligned} \hat{{\varvec{\theta }}}(t)= & {} [{\varvec{\varOmega }}_t^{\tiny \text{ T }}{\varvec{\varOmega }}_t]^{-1}{\varvec{\varOmega }}_t^{\tiny \text{ T }}[{\varvec{Y}}_t-{\varvec{\varPhi }}_t{\varvec{b}}], \end{aligned}$$

(22)

$$\begin{aligned} \hat{{\varvec{\vartheta }}}(t)= & {} [{\varvec{\varXi }}_t^{\tiny \text{ T }}{\varvec{\varXi }}_t]^{-1}{\varvec{\varXi }}_t^{\tiny \text{ T }}[{\varvec{Y}}_t-{\varvec{\varPsi }}_t{\varvec{d}}]. \end{aligned}$$

(23)

Define the covariance matrix,

$$\begin{aligned} {\varvec{P}}_1^{-1}(t):= & {} {\varvec{\varOmega }}_t^{\tiny \text{ T }}{\varvec{\varOmega }}_t=\sum \limits _{j=1}^t{\varvec{\varphi }}_1(j){\varvec{\varphi }}_1^{\tiny \text{ T }}(j)\nonumber \\= & {} {\varvec{P}}_1^{-1}(t-1)+{\varvec{\varphi }}_1(t){\varvec{\varphi }}_1^{\tiny \text{ T }}(t)\in {\mathbb R}^{(n_a+n_d)\times (n_a+n_d)},\quad {\varvec{P}}(0)={\varvec{I}}_{n_a+n_d}.\qquad \quad \end{aligned}$$

(24)

Hence, Eq. (22) can be rewritten as

$$\begin{aligned} \hat{{\varvec{\theta }}}(t)= & {} {\varvec{P}}_1(t){\varvec{\varOmega }}_t^{\tiny \text{ T }}[{\varvec{Y}}_t-{\varvec{\varPhi }}_t{\varvec{b}}]\nonumber \\= & {} {\varvec{P}}_1(t)[{\varvec{\varOmega }}_{t-1}^{\tiny \text{ T }}, {\varvec{\varphi }}_1(t)]\left[ \begin{array}{c}{\varvec{Y}}_{t-1}-{\varvec{\varPhi }}_{t-1}{\varvec{b}}\\ y(t)-{\varvec{\varphi }}^{\tiny \text{ T }}(t){\varvec{b}}\end{array}\right] \nonumber \\= & {} {\varvec{P}}_1(t){\varvec{P}}^{-1}_1(t-1){\varvec{P}}_1(t-1)\left\{ {\varvec{\varOmega }}_{t-1}^{\tiny \text{ T }}[{\varvec{Y}}_{t-1}-{\varvec{\varPhi }}_{t-1}{\varvec{b}}]+{\varvec{\varphi }}_1(t)[y(t)-{\varvec{\varphi }}^{\tiny \text{ T }}(t){\varvec{b}}]\right\} \nonumber \\= & {} {\varvec{P}}_1(t){\varvec{P}}^{-1}_1(t-1)\hat{{\varvec{\theta }}}(t-1)+{\varvec{P}}_1(t){\varvec{\varphi }}_1(t)[y(t)-{\varvec{\varphi }}^{\tiny \text{ T }}(t){\varvec{b}}]. \end{aligned}$$

(25)

Applying the matrix inversion lemma [7, 27]

$$\begin{aligned} ({\varvec{A}}+{\varvec{B}}{\varvec{C}})^{-1}={\varvec{A}}^{-1}-{\varvec{A}}^{-1}{\varvec{B}}({\varvec{I}}+{\varvec{C}}{\varvec{A}}^{-1}{\varvec{B}})^{-1}{\varvec{C}}{\varvec{A}}^{-1}. \end{aligned}$$

to (24) gives

$$\begin{aligned} {\varvec{P}}_1(t)={\varvec{P}}_1(t-1)-\frac{{\varvec{P}}_1(t-1){\varvec{\varphi }}_1(t){\varvec{\varphi }}_1^{\tiny \text{ T }}(t){\varvec{P}}_1(t-1)}{1+{\varvec{\varphi }}_1^{\tiny \text{ T }}(t){\varvec{P}}_1(t-1){\varvec{\varphi }}_1(t)}. \end{aligned}$$

(26)

Pre-multiplying both sides of (24) by ${\varvec{P}}_1(t)$ gives

$$\begin{aligned} {\varvec{I}}={\varvec{P}}_1(t){\varvec{P}}_1^{-1}(t-1)+{\varvec{P}}_1(t){\varvec{\varphi }}_1(t){\varvec{\varphi }}^{\tiny \text{ T }}_1(t). \end{aligned}$$

(27)

Substituting (27) into (25) gives the recursive estimate of the parameter vector ${\varvec{\theta }}$:

$$\begin{aligned} \hat{{\varvec{\theta }}}(t)= & {} [{\varvec{I}}-{\varvec{P}}_1(t){\varvec{\varphi }}_1(t){\varvec{\varphi }}^{\tiny \text{ T }}_1(t)]\hat{{\varvec{\theta }}}(t-1)+{\varvec{P}}_1(t){\varvec{\varphi }}_1(t)[y(t)-{\varvec{\varphi }}^{\tiny \text{ T }}(t){\varvec{b}}]\nonumber \\= & {} \hat{{\varvec{\theta }}}(t-1)+{\varvec{P}}_1(t){\varvec{\varphi }}_1(t)[y(t)-{\varvec{\varphi }}^{\tiny \text{ T }}(t){\varvec{b}}-{\varvec{\varphi }}^{\tiny \text{ T }}_1(t)\hat{{\varvec{\theta }}}(t-1)]. \end{aligned}$$

(28)

Define the gain vector ${\varvec{L}}_1(t):={\varvec{P}}_1(t){\varvec{\varphi }}_1(t)\in {\mathbb R}^{n_a+n_d}$. Using (26), it follows that

$$\begin{aligned} {\varvec{L}}_1(t)=\frac{{\varvec{P}}_1(t-1){\varvec{\varphi }}_1(t)}{1+{\varvec{\varphi }}_1^{\tiny \text{ T }}(t) {\varvec{P}}_1(t-1){\varvec{\varphi }}_1(t)}. \end{aligned}$$

(29)

Using (29), Eq. (26) can be rewritten as

$$\begin{aligned} {\varvec{P}}_1(t)={\varvec{P}}_1(t-1)-{\varvec{L}}_1(t){\varvec{\varphi }}_1^{\tiny \text{ T }}(t){\varvec{P}}_1(t-1)=[{\varvec{I}}-{\varvec{L}}_1(t){\varvec{\varphi }}_1^{\tiny \text{ T }}(t)]{\varvec{P}}_1(t-1),\ {\varvec{P}}_1(0)=p_0{\varvec{I}}. \end{aligned}$$

(30)

Here, we can see that the right-hand sides of (19) and (28) contain the unknown parameter vectors ${\varvec{c}}$ and ${\varvec{b}}$, respectively. The solution is to replace the unknown ${\varvec{b}}$ and ${\varvec{\varphi }}_1(t)$ in (28) and (30) with their corresponding estimates $\hat{{\varvec{b}}}(t-1)$ and $\hat{{\varvec{\varphi }}}_1(t)=[\hat{{\varvec{c}}}^{\tiny \text{ T }}(t-1){\varvec{F}}^{\tiny \text{ T }}(t), \hat{{\varvec{\psi }}}^{\tiny \text{ T }}(t)]^{\tiny \text{ T }}$, we have

$$\begin{aligned} \hat{{\varvec{\theta }}}(t)= & {} \hat{{\varvec{\theta }}}(t-1)+{\varvec{P}}_1(t){\varvec{\varphi }}_1(t)[y(t)-{\varvec{\varphi }}^{\tiny \text{ T }}(t)\hat{{\varvec{b}}}(t-1)-\hat{{\varvec{\varphi }}}^{\tiny \text{ T }}_1(t)\hat{{\varvec{\theta }}}(t-1)],\nonumber \\ {\varvec{P}}_1(t)= & {} [{\varvec{I}}_{n_a+n_d}-{\varvec{L}}_1(t)\hat{{\varvec{\varphi }}}_1^{\tiny \text{ T }}(t)]{\varvec{P}}_1(t-1),\ {\varvec{P}}_1(0)=p_0{\varvec{I}}_{n_a+n_d}, \end{aligned}$$

(31)

$$\begin{aligned} \hat{{\varvec{\psi }}}(t)= & {} [\hat{v}(t-1), \hat{v}(t-2), \ldots , \hat{v}(t-n_d)]^{\tiny \text{ T }}\in {\mathbb R}^{n_d}. \end{aligned}$$

(32)

Define the information vector ${\varvec{\varphi }}_2(t):=[{\varvec{\varphi }}^{\tiny \text{ T }}(t), {\varvec{a}}^{\tiny \text{ T }}{\varvec{F}}(t)]^{\tiny \text{ T }}\in {\mathbb R}^{n_b+n_c}$ and the covariance matrix ${\varvec{P}}_2^{-1}(t):={\varvec{\varXi }}_t^{\tiny \text{ T }}{\varvec{\varXi }}_t\in {\mathbb R}^{(n_b+n_c)\times (n_b+n_c)}$ and the gain vector ${\varvec{L}}_2(t):={\varvec{P}}_2(t){\varvec{\varphi }}_2(t)\in {\mathbb R}^{n_b+n_c}$. Similarly, some unknown terms are replaced with their estimates; we can obtain the recursive estimate of the parameter vector ${\varvec{\vartheta }}$:

$$\begin{aligned} \hat{{\varvec{\vartheta }}}(t)= & {} \hat{{\varvec{\vartheta }}}(t-1)+{\varvec{P}}_2(t)\hat{{\varvec{\varphi }}}^{\tiny \text{ T }}_2(t)[y(t)-\hat{{\varvec{\psi }}}^{\tiny \text{ T }}(t)\hat{{\varvec{d}}}(t)-\hat{{\varvec{\varphi }}}^{\tiny \text{ T }}_2(t)\hat{{\varvec{\vartheta }}}(t-1)], \end{aligned}$$

(33)

$$\begin{aligned} \hat{{\varvec{\varphi }}}_2(t)= & {} [{\varvec{\varphi }}^{\tiny \text{ T }}(t), \hat{{\varvec{a}}}^{\tiny \text{ T }}(t){\varvec{F}}(t)]\in {\mathbb R}^{n_b+n_c}. \end{aligned}$$

(34)

Thus, we can summarize the recursive least squares algorithm for estimating the parameter vectors ${\varvec{\theta }}$ and ${\varvec{\vartheta }}$ of the nonlinear systems based on the model decomposition (the ON-RLS algorithm for short) as follows:

$$\begin{aligned} \hat{{\varvec{\theta }}}(t)= & {} \hat{{\varvec{\theta }}}(t-1)+{\varvec{L}}_1(t)[y(t)-{\varvec{\varphi }}^{\tiny \text{ T }}(t)\hat{{\varvec{b}}}(t-1)-\hat{{\varvec{\varphi }}}^{\tiny \text{ T }}_1(t)\hat{{\varvec{\theta }}}(t-1)], \end{aligned}$$

(35)

$$\begin{aligned} {\varvec{L}}_1(t)= & {} {\varvec{P}}_1(t-1)\hat{{\varvec{\varphi }}}_1(t)[1+\hat{{\varvec{\varphi }}}_1^{\tiny \text{ T }}(t){\varvec{P}}_1(t-1)\hat{{\varvec{\varphi }}}_1(t)]^{-1}, \end{aligned}$$

(36)

$$\begin{aligned} {\varvec{P}}_1(t)= & {} [{\varvec{I}}_{n_a+n_d}-{\varvec{L}}_1(t)\hat{{\varvec{\varphi }}}_1^{\tiny \text{ T }}(t)]{\varvec{P}}_1(t-1),\ {\varvec{P}}_1(0)=p_0{\varvec{I}}_{n_a+n_d}, \end{aligned}$$

(37)

$$\begin{aligned} \hat{{\varvec{\vartheta }}}(t)= & {} \hat{{\varvec{\vartheta }}}(t-1)+{\varvec{L}}_2(t)[y(t)-\hat{{\varvec{\psi }}}^{\tiny \text{ T }}(t)\hat{{\varvec{d}}}(t)-\hat{{\varvec{\varphi }}}^{\tiny \text{ T }}_2(t)\hat{{\varvec{\vartheta }}}(t-1)], \end{aligned}$$

(38)

$$\begin{aligned} {\varvec{L}}_2(t)= & {} {\varvec{P}}_2(t-1)\hat{{\varvec{\varphi }}}_2(t)[1+\hat{{\varvec{\varphi }}}_2^{\tiny \text{ T }}(t){\varvec{P}}_2(t-1)\hat{{\varvec{\varphi }}}_2(t)]^{-1}, \end{aligned}$$

(39)

$$\begin{aligned} {\varvec{P}}_2(t)= & {} [{\varvec{I}}_{n_b+n_c}-{\varvec{L}}_2(t)\hat{{\varvec{\varphi }}}_2^{\tiny \text{ T }}(t)]{\varvec{P}}_2(t-1),\ {\varvec{P}}_2(0)=p_0{\varvec{I}}_{n_b+n_c}, \end{aligned}$$

(40)

$$\begin{aligned} \hat{{\varvec{\varphi }}}_1(t)= & {} [\hat{{\varvec{c}}}^{\tiny \text{ T }}(t-1){\varvec{F}}^{\tiny \text{ T }}(t), {\varvec{\psi }}^{\tiny \text{ T }}(t)]^{\tiny \text{ T }}, \end{aligned}$$

(41)

$$\begin{aligned} \hat{{\varvec{\varphi }}}_2(t)= & {} [{\varvec{\varphi }}^{\tiny \text{ T }}(t), \hat{{\varvec{a}}}^{\tiny \text{ T }}(t){\varvec{F}}(t)]^{\tiny \text{ T }}, \end{aligned}$$

(42)

$$\begin{aligned} {\varvec{\varphi }}(t)= & {} [u(t-1), u(t-2), \ldots , u(t-n_b)]^{\tiny \text{ T }}, \end{aligned}$$

(43)

$$\begin{aligned} \hat{{\varvec{\psi }}}(t)= & {} [\hat{v}(t-1), \hat{v}(t-2), \ldots , \hat{v}(t-n_d)]^{\tiny \text{ T }}, \end{aligned}$$

(44)

$$\begin{aligned} \hat{v}(t)= & {} y(t)-\hat{{\varvec{a}}}^{\tiny \text{ T }}(t){\varvec{F}}(t)\hat{{\varvec{c}}}(t)-\hat{{\varvec{\varphi }}}^{\tiny \text{ T }}(t)\hat{{\varvec{b}}}(t)-\hat{{\varvec{\psi }}}^{\tiny \text{ T }}(t)\hat{{\varvec{d}}}(t), \end{aligned}$$

(45)

$$\begin{aligned} {\varvec{F}}(t)= & {} \left[ \begin{array}{cccc}f_1(y(t-1)) &{} f_2(y(t-1)) &{} \ldots &{} f_{n_c}(y(t-1))\\ f_1(y(t-2)) &{} f_2(y(t-2)) &{} \ldots &{} f_{n_c}(y(t-2))\\ \vdots &{} \vdots &{} \quad &{} \vdots \\ f_1(y(t-n_a)) &{} f_2(y(t-n_a)) &{} \ldots &{} f_{n_c}(y(t-n_a))\\ \end{array}\right] , \end{aligned}$$

(46)

$$\begin{aligned} \hat{{\varvec{\varTheta }}}(t)= & {} \left[ \begin{array}{cc} \hat{{\varvec{\theta }}}(t)\\ \hat{{\varvec{\vartheta }}}(t) \end{array}\right] . \end{aligned}$$

(47)

The procedures of computing the parameter estimation vectors $\hat{{\varvec{\theta }}}(t)$ and $\hat{{\varvec{\vartheta }}}(t)$ in (35)–(47) are listed in the following.

1.
To initialize: let $t=1$, and set the initial values ${\varvec{P}}_1(0)=p_0{\varvec{I}}_{n_a+n_d}$, $\hat{{\varvec{\theta }}}(0)=\mathbf{1}_{n_a+n_d}/p_0$, $\hat{{\varvec{b}}}(0)=\mathbf{1}_{n_b}/{p_0}$, ${\varvec{P}}_2(0)=p_0{\varvec{I}}_{n_b+n_c}$, $p_0=10^6$, $\Vert \hat{{\varvec{c}}}(0)\Vert =1$, give the basis function $f_j(*)$.
2.
Collect the input–output data u(t) and y(t), form ${\varvec{F}}(t)$ using (46), ${\varvec{\varphi }}(t)$ using (43), $\hat{{\varvec{\psi }}}(t)$ using (44) and compute $\hat{{\varvec{\varphi }}}_1(t)$ in (41).
3.
Compute ${\varvec{L}}_1(t)$ using (36) and ${\varvec{P}}_1(t)$ using (37).
4.
Update the parameter estimate $\hat{{\varvec{\theta }}}(t)$ using (35) and read $\hat{{\varvec{a}}}(t)$ and $\hat{{\varvec{d}}}(t)$ from $\hat{{\varvec{\theta }}}(t)=\left[ \begin{array}{c} \hat{{\varvec{a}}}(t) \\ \hat{{\varvec{d}}}(t) \end{array} \right] $.
5.
Form $\hat{{\varvec{\varphi }}}_2(t)$ in (42) and compute ${\varvec{L}}_2(t)$ using (39) and ${\varvec{P}}_2(t)$ using (40).
6.
Update the parameter estimate $\hat{{\varvec{\vartheta }}}(t)$ using (38) and read $\hat{{\varvec{b}}}(t)$ from $\hat{{\varvec{\vartheta }}}(t)=\left[ \begin{array}{c} \hat{{\varvec{b}}}(t) \\ \hat{{\varvec{c}}}(t) \end{array} \right] $ and normalize $\hat{{\varvec{c}}}(t)$ using
$$\begin{aligned} \hat{{\varvec{c}}}(t)=\mathrm{sgn}\{[\hat{{\varvec{\vartheta }}}(t)\}(n_b+1)\}\frac{[\hat{{\varvec{\vartheta }}}(t)](n_b+1:n_b+n_c)}{\Vert [\hat{{\varvec{\vartheta }}}(t)](n_b+1:n_b+n_c)\Vert }, \end{aligned}$$
(48)
and let $\hat{{\varvec{\vartheta }}}(t)=\left[ \begin{array}{c} \hat{{\varvec{b}}}(t) \\ \hat{{\varvec{c}}}(t) \end{array} \right] $.
7.
Compute $\hat{v}(t)$ using (45).
8.
Increase t by 1, go to Step 2 and continue the recursive calculation.

The flowchart for computing the estimates $\hat{{\varvec{\theta }}}(t)$ and $\hat{{\varvec{\vartheta }}}(t)$ in (35)–(47) is shown in Fig. 1.

To show the advantages of the proposed ON-RLS algorithm, the following gives the stochastic gradient algorithm with a forgetting factor $\lambda $ for estimating the parameter vectors ${\varvec{\theta }}$ and ${\varvec{\vartheta }}$ of the nonlinear systems (the ON-SG algorithm for short) [11]:

$$\begin{aligned} \hat{{\varvec{\theta }}}(t)= & {} \hat{{\varvec{\theta }}}(t-1)+\frac{\hat{{\varvec{\varphi }}}_1(t)}{r_1(t)}[y(t)-{\varvec{\varphi }}^{\tiny \text{ T }}(t)\hat{{\varvec{b}}}(t-1)-\hat{{\varvec{\varphi }}}^{\tiny \text{ T }}_1(t)\hat{{\varvec{\theta }}}(t-1)], \end{aligned}$$

(49)

$$\begin{aligned} r_1(t)= & {} \lambda r_1(t)+\Vert \hat{{\varvec{\varphi }}}_1(t)\Vert ^2,\ 0\leqslant \lambda \leqslant 1,\ r_1(0)=1, \end{aligned}$$

(50)

$$\begin{aligned} \hat{{\varvec{\vartheta }}}(t)= & {} \hat{{\varvec{\vartheta }}}(t-1)+\frac{\hat{{\varvec{\varphi }}}_2(t)}{r_2(t)}[y(t)-\hat{{\varvec{\psi }}}^{\tiny \text{ T }}(t)\hat{{\varvec{d}}}(t)-\hat{{\varvec{\varphi }}}^{\tiny \text{ T }}_2(t)\hat{{\varvec{\vartheta }}}(t-1)], \end{aligned}$$

(51)

$$\begin{aligned} r_2(t)= & {} \lambda r_2(t)+\Vert \hat{{\varvec{\varphi }}}_2(t)\Vert ^2,\ r_2(0)=1. \end{aligned}$$

(52)

Remark 1

The ON-RLS algorithm in (35)–(47) has faster convergence rates than the ON-SG algorithm in (49)–(52)—see the last columns in Tables 3 and 4.

5 The Comparison of the Computational Efficiency

In order to show the advantage of the ON-RLS algorithm, the following gives simply recursive extended least squares algorithm in [21] for comparison.

Define the parameter vectors,

$$\begin{aligned} {\varvec{\vartheta }}:=\left[ \begin{array}{c} {\varvec{a}}\otimes {\varvec{c}} \\ {\varvec{b}} \\ {\varvec{d}} \end{array}\right] \in {\mathbb R}^n, \quad n:=n_an_c+n_b+n_d. \end{aligned}$$

Then, we have the following recursive extended least squares (RELS) algorithm [21]:

$$\begin{aligned} \hat{{\varvec{\vartheta }}}(t)= & {} \hat{{\varvec{\vartheta }}}(t-1)+{\varvec{L}}(t)[y(t)-\hat{{\varvec{\varphi }}}^{\tiny \text{ T }}(t)\hat{{\varvec{\vartheta }}}(t-1)],\ \hat{{\varvec{\vartheta }}}(0)=\mathbf{1}_n/{p_0},\\ {\varvec{L}}(t)= & {} {\varvec{P}}(t-1)\hat{{\varvec{\varphi }}}(t)[1+\hat{{\varvec{\varphi }}}^{\tiny \text{ T }}(t){\varvec{P}}(t-1)\hat{{\varvec{\varphi }}}(t)]^{-1},\\ {\varvec{P}}(t)= & {} [{\varvec{I}}_n-{\varvec{L}}(t)\hat{{\varvec{\varphi }}}^{\tiny \text{ T }}(t)]{\varvec{P}}(t-1),\ {\varvec{P}}(0)=p_0{\varvec{I}}_n,\\ \hat{{\varvec{\varphi }}}(t)= & {} [{\varvec{h}}^{\tiny \text{ T }}(y(t-1)),\ldots , {\varvec{h}}^{\tiny \text{ T }}(y(t-n_a)), \\&\quad u(t-1), \ldots , u(t-n_b), \hat{v}(t-1),\ldots , \hat{v}(t-n_d)]^{\tiny \text{ T }},\\ {\varvec{h}}(t)= & {} [h_1(y(t)), h_2(y(t)), \ldots , h_{n_c}(y(t))]^{\tiny \text{ T }}\in {\mathbb R}^{n_c},\\ \hat{v}(t)= & {} y(t)-\hat{{\varvec{\varphi }}}^{\tiny \text{ T }}(t)\hat{{\varvec{\vartheta }}}(t). \end{aligned}$$

Table 1 The computational efficiency of the ON-RLS algorithm

Full size table

Table 2 The computational efficiency of the RELS algorithm

Full size table

Remark 2

Compared with the recursive extended least squares algorithm, which involves the covariance matrix ${\varvec{P}}(t)$ of large size $(n_an_c+n_b+n_d)\times (n_an_c+n_b+n_d)$, the ON-RLS algorithm has less computational load because it involves two covariance matrices ${\varvec{P}}_1(t)$ and ${\varvec{P}}_2(t)$ of small sizes $(n_a+n_d)\times (n_a+n_d)$ and $(n_b+n_c)\times (n_b+n_c)$—see the details in Tables 1 and 2. From the simulation example (omitted in the paper), we can see that the parameter estimation errors given by the ON-RLS algorithm are very close to those given by the RELS algorithm.

It has been just pointed out by Golub and Van Loan [16] that the flop (floating point operation) counting is a necessarily crude approach to the measuring of program efficiency since it ignores subscripting, memory traffic, and the countless other overheads associated with program execution, the flop counting is just a “quick and dirty” accounting method that captures only one of the several dimensions of the efficiency issue although multiplication/division and addition/subtraction with different lengths are different. The flop numbers of the ON-RLS and RELS algorithms at each recursion are given in Tables 1 and 2. Their total flops are, respectively, given by

$$\begin{aligned} N_1:= & {} 4(n_a+n_d)^2+4(n_b+n_c)^2+4n_an_c+5n_a+10n_b+7n_c+10n_d,\\ N_2:= & {} 4(n_an_c+n_b+n_d)^2+6(n_an_c+n_b+n_d). \end{aligned}$$

In order to compare the computational efficiency of these two algorithms, we count the difference between the amount of calculation of these two algorithms. When $n_a>2$ and $n_b>2$, $n_an_b>n_a+n_b$, $N_2>4(n_a+n_b+n_c+n_d)^2+6(n_a+n_b+n_c+n_d)$. Then, we have

$$\begin{aligned} N_2-N_1> & {} 4(n_a+n_c+n_b+n_d)^2+6(n_a+n_c+n_b+n_d)-4(n_a+n_d)^2\\&-\,4(n_b+n_c)^2-4n_an_c-5n_a-10n_b-7n_c-10n_d\\= & {} 10n_an_c+(8n_b-5)n_a+(8n_c-4)n_b+(8n_b-4)n_d>0. \end{aligned}$$

It is clear that the ON-RLS algorithm requires less computational load than the RELS algorithm. For example, when $n_a=n_b=n_c=n_d=5$, we have $N_2-N_1=5110-1060=4050$ flops.

6 Example

Consider the following nonlinear system:

$$\begin{aligned} y(t)= & {} A(z)f(y(t))+B(z)u(t)+D(z)v(t),\\ A(z)= & {} a_1z^{-1}+a_2z^{-2}=-0.75z^{-1}-0.61z^{-2},\\ B(z)= & {} b_1z^{-1}=0.96z^{-1},\\ D(z)= & {} 1+d_1z^{-1}=1+0.4z^{-1},\\ f(y(t))= & {} c_1y(t)+c_2{\sin }(y(t))=0.61y(t)+0.79{\sin }(y(t)),\\ {\varvec{\theta }}= & {} [a_1, a_2, d_1]^{\tiny \text{ T }}\\= & {} [-0.75, -0.61, 0.4]^{\tiny \text{ T }},\\ {\varvec{\vartheta }}= & {} [b_1, c_1, c_2]^{\tiny \text{ T }}=[0.96, 0.61, 0.79]^{\tiny \text{ T }},\\ {\varvec{\varTheta }}= & {} [a_1, a_2, d_1, b_1, c_1, c_2]^{\tiny \text{ T }}=[-0.75, -0.61, 0.4, 0.96, 0.61, 0.79]^{\tiny \text{ T }}. \end{aligned}$$

In simulation, the input $\{u(t)\}$ is taken as a persistent excitation signal sequence with zero mean and unit variance, and $\{v(t)\}$ as a white noise sequence with zero mean and variance $\sigma ^2=0.50^2$. Applying the ON-RLS algorithm and the ON-SG algorithm with $\lambda =0.99$ to estimate the parameters of this system, the parameter estimates and errors are given in Tables 3 and 4, and the parameter estimation errors $\delta :=\Vert \hat{{\varvec{\varTheta }}}(t)-{\varvec{\varTheta }}\Vert /\Vert {\varvec{\varTheta }}\Vert $ versus t are shown in Figs. 2 and 3.

Table 3 The ON-RLS parameter estimates and errors

Full size table

Table 4 The ON-SG parameter estimates and errors

Full size table

From Tables 3, 4 and Figs. 2, 3, we can draw the following conclusions.

It is clear that the parameter estimation errors given by two algorithms become smaller with the data length increasing.
The parameter estimation accuracy of the ON-RLS algorithm is higher than that of the ON-SG algorithm.
The parameter estimates given by the ON-RLS algorithm converge faster to their true values compared with the ON-SG algorithm for appropriate forgetting factors.

7 Conclusions

Using the hierarchical identification principle, a recursive least squares algorithm is derived for a special class of output nonlinear systems by transforming a nonlinear system into two identification models. The proposed algorithm can give a satisfactory identification accuracy and has higher computational efficiencies compared with the recursive extended least squares parameter estimation algorithm in [21]. The proposed algorithm can be extended to study identification problems of multivariable systems [13], linear-in-parameters systems [41, 42] and impulsive dynamical systems [23, 24] and applied to other fields [45–47].

References

J.C. Agüero, G.C. Goodwin, P.M.J. Van den Hof, A virtual closed loop method for closed loop identification. Automatica 47(8), 1626–1637 (2011)
M. Ahmadi, H. Mojallali, Identification of multiple-input single-output Hammerstein models using Bezier curves and Bernstein polynomials. Appl. Math. Model. 35(4), 1969–1982 (2011)
Article MathSciNet MATH Google Scholar
E.W. Bai, A blind approach to the Hammerstein–Wiener model identification. Automatica 38(6), 967–979 (2002)
Article MathSciNet MATH Google Scholar
E.W. Bai, An optimal two-stage identification algorithm for Hammerstein–Wiener nonlinear systems. Automatica 34(3), 333–338 (1998)
Article MathSciNet MATH Google Scholar
M. Dehghan, M. Hajarian, Fourth-order variants of Newton’s method without second derivatives for solving non-linear equations. Eng. Comput. 29(4), 356–365 (2012)
Article Google Scholar
M. Dehghan, M. Hajarian, Analysis of an iterative algorithm to solve the generalized coupled Sylvester matrix equations. Appl. Math. Model. 35(7), 3285–3300 (2011)
Article MathSciNet MATH Google Scholar
F. Ding, System Identification-New Theory and Methods (Science Press, Beijing, 2013)
Google Scholar
F. Ding, System Identification-Performances Analysis for Identification Methods (Science Press, Beijing, 2014)
Google Scholar
J. Ding, C.X. Fan, J.X. Lin, Auxiliary model based parameter estimation for dual-rate output error systems with colored noise. Appl. Math. Model. 37(6), 4051–4058 (2013)
Article MathSciNet Google Scholar
J. Ding, J.X. Lin, Modified subspace identification for periodically non-uniformly sampled systems by using the lifting technique. Circuits Syst. Signal Process. 33(5), 1439–1449 (2014)
Article Google Scholar
F. Ding, X.P. Liu, G. Liu, Identification methods for Hammerstein nonlinear systems. Digit. Signal Process. 21(2), 215–238 (2011)
Article Google Scholar
F. Ding, K.P. Deng, X.M. Liu, Decomposition based Newton iterative identification method for a Hammerstein nonlinear FIR system with ARMA noise. Circuits Syst. Signal Process. 33(9), 2881–2893 (2014)
Article MathSciNet Google Scholar
F. Ding, Y.J. Wang, J. Ding, Recursive least squares parameter estimation algorithms for systems with colored noise using the filtering technique and the auxiliary model. Digit. Signal Process. 37, 100–108 (2015)
Article Google Scholar
D. Fan, K. Lo, Identification for disturbed MIMO Wiener systems. Nonlinear Dyn. 55(1–2), 31–42 (2009)
Article MathSciNet MATH Google Scholar
G.C. Goodwin, K.S. Sin, Adaptive Filtering Prediction and Control (Prentice Hall, Englewood Cliffs, 1984)
MATH Google Scholar
G.H. Golub, C.F. Van Loan, Matrix Computations, 3rd edn. (Johns Hopkins University Press, Baltimore, 1996)
MATH Google Scholar
Y. Gu, F. Ding, J.H. Li, States based iterative parameter estimation for a state space model with multi-state delays using decomposition. Signal Process. 106, 294–230 (2015)
Article Google Scholar
A. Hagenblad, L. Ljung, A. Wills, Maximum likelihood identification of Wiener models. Automatica 44(11), 2697–2705 (2008)
Article MathSciNet MATH Google Scholar
Y.B. Hu, Iterative and recursive least squares estimation algorithms for moving average systems. Simul. Model. Pract. Theory 34, 12–19 (2013)
Article Google Scholar
P.P. Hu, F. Ding, Multistage least squares based iterative estimation for feedback nonlinear systems with moving average noises using the hierarchical identification principle. Nonlinear Dyn. 73(1–2), 583–592 (2013)
Article MathSciNet MATH Google Scholar
Y.B. Hu, B.L. Liu, Q. Zhou, C. Yang, Recursive extended least squares parameter estimation for Wiener nonlinear systems with moving average noises. Circuits Syst. Signal Process. 33(2), 655–664 (2014)
Article MathSciNet Google Scholar
Y.B. Hu, B.L. Liu, Q. Zhou, A multi-innovation generalized extended stochastic gradient algorithm for output nonlinear autoregressive moving average systems. Appl. Math. Comput. 247, 218–224 (2014)
MathSciNet MATH Google Scholar
Y. Ji, X.M. Liu, Unified synchronization criteria for hybrid switching-impulsive dynamical networks. Circuits Syst. Signal Process. 34(5), 1499–1517 (2015)
Article MathSciNet Google Scholar
Y. Ji, X.M. Liu et al., New criteria for the robust impulsive synchronization of uncertain chaotic delayed nonlinear systems. Nonlinear Dyn. 79(1), 1–9 (2015)
Article MathSciNet MATH Google Scholar
J.H. Li, Parameter estimation for Hammerstein CARARMA systems based on the Newton iteration. Appl. Math. Lett. 26(1), 91–96 (2013)
Article MathSciNet MATH Google Scholar
X.G. Liu, J. Lu, Least squares based iterative identification for a class of multirate systems. Automatica 46(3), 549–554 (2010)
Article MathSciNet MATH Google Scholar
L. Ljung, System Identification: Theory for the User, 2nd edn. (Prentice Hall, Englewood Cliffs, 1999)
MATH Google Scholar
Y.W. Mao, F. Ding, A novel data filtering based multi-innovation stochastic gradient algorithm for Hammerstein nonlinear systems. Digit. Signal Process. 46, 215–225 (2015)
P. Qin, R. Nishii, Z.J. Yang, Selection of NARX models estimated using weighted least squares method via GIC-based method and l1-norm regularization methods. Nonlinear Dyn. 70(3), 1831–1846 (2012)
Article MathSciNet Google Scholar
Y. Shi, T. Chen, Optimal design of multi-channel transmultiplexers with stopband energy and passband magnitude constraints. IEEE Trans. Circuits Syst. II Analog Digit. Signal Process. 50(9), 659–662 (2003)
Article Google Scholar
Y. Shi, B. Yu, Robust mixed H-2/H-infinity control of networked control systems with random time delays in both forward and backward communication links. Automatica 47(4), 754–760 (2011)
Article MathSciNet MATH Google Scholar
Y. Shi, B. Yu, Output feedback stabilization of networked control systems with random delays modeled by Markov chains. IEEE Trans. Autom. Control 54(7), 1668–1674 (2009)
Article MathSciNet Google Scholar
J. Vörös, Modelling and identification of Wiener systems with two-segment nonlinearities. IEEE Trans. Control Syst. Technol. 11(2), 253–265 (2003)
Article Google Scholar
J. Vörös, Modeling and parameter identification of systems with multi-segment piecewise-linear characteristics. IEEE Trans. Autom. Control 47(1), 184–188 (2002)
Article Google Scholar
D.Q. Wang, Least squares-based recursive and iterative estimation for output error moving average systems using data filtering. IET Control Theory Appl. 5(14), 1648–1657 (2011)
Article MathSciNet Google Scholar
D.Q. Wang, F. Ding, Least squares based and gradient based iterative identification for Wiener nonlinear systems. Signal Process. 91(5), 1182–1189 (2011)
Article MATH Google Scholar
X.H. Wang, F. Ding, Convergence of the recursive identification algorithms for multivariate pseudo-linear regressive systems. Int. J. Adapt. Control Signal Process. 30 (2016). doi:10.1002/acs.2642
X.H. Wang, F. Ding, Recursive parameter and state estimation for an input nonlinear state space system using the hierarchical identification principle. Signal Process. 117, 208–218 (2015)
Article Google Scholar
D.Q. Wang, Y.P. Gao, Recursive maximum likelihood identification method for a multivariable controlled autoregressive moving average system. IMA J. Math. Control Inf. (2015). doi:10.1093/imamci/dnv021
D.Q. Wang, H.B. Liu, F. Ding, Highly efficient identification methods for dual-rate Hammerstein systems. IEEE Trans. Control Syst. Technol. 23(5), 1952–1960 (2015)
Article Google Scholar
C. Wang, T. Tang, Recursive least squares estimation algorithm applied to a class of linear-in-parameters output error moving average systems. Appl. Math. Lett. 29, 36–41 (2014)
Article MathSciNet MATH Google Scholar
C. Wang, T. Tang, Several gradient-based iterative estimation algorithms for a class of nonlinear systems using the filtering technique. Nonlinear Dyn. 77(3), 769–780 (2014)
Article MathSciNet MATH Google Scholar
D.Q. Wang, W. Zhang, Improved least squares identification algorithm for multivariable Hammerstein systems. J. Frankl. Inst. Eng. Appl. Math. 352(11), 5292–5370 (2015)
A. Wills, T.B. Schön, L. Ljung, B. Ninness, Identification of Hammerstein–Wiener models. Automatica 49(1), 70–81 (2013)
Article MathSciNet MATH Google Scholar
D.Q. Zhu, X. Hua, B. Sun, A neurodynamics control strategy for real-time tracking control of autonomous underwater vehicles. J. Navig. 67(1), 113–127 (2014)
Article Google Scholar
D.Q. Zhu, H. Huang, S.X. Yang, Dynamic task assignment and path planning of multi-AUV system based on an improved self-organizing map and velocity synthesis method in 3D underwater workspace. IEEE Trans. Cybern. 43(2), 504–514 (2013)
Article Google Scholar
D.Q. Zhu, Q. Liu, Z. Hu, Fault-tolerant control algorithm of the manned submarine with multi-thruster based on quantum behaved particle swarm optimization. Int. J. Control 84(11), 1817–1829 (2012)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61203111) and the PAPD of Jiangsu Higher Education Institutions.

Author information

Authors and Affiliations

Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi, 214122, People’s Republic of China
Feng Ding, Xuehai Wang, Qijia Chen & Yongsong Xiao

Authors

Feng Ding
View author publications
You can also search for this author in PubMed Google Scholar
Xuehai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qijia Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yongsong Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng Ding.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, F., Wang, X., Chen, Q. et al. Recursive Least Squares Parameter Estimation for a Class of Output Nonlinear Systems Based on the Model Decomposition. Circuits Syst Signal Process 35, 3323–3338 (2016). https://doi.org/10.1007/s00034-015-0190-6

Download citation

Received: 25 June 2014
Revised: 23 October 2015
Accepted: 24 October 2015
Published: 17 November 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s00034-015-0190-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Recursive Least Squares Parameter Estimation for a Class of Output Nonlinear Systems Based on the Model Decomposition

Abstract

Similar content being viewed by others

\(\ell _1\)-regularized recursive total least squares based sparse system identification for the error-in-variables

Maximum Likelihood-Based Recursive Least-Squares Algorithm for Multivariable Systems with Colored Noises Using the Decomposition Technique

Decomposition-based least squares parameter estimation algorithm for input nonlinear systems using the key term separation technique

1 Introduction

2 The System Description and its Identification Model

3 The Least Squares Algorithm Based on the Model Decomposition