1 Introduction

The block-oriented nonlinear systems are popular used for modeling and analyzing nonlinear problems in various aspects of our society, such as parameter estimation [1, 2], energy harvesting systems [3], signal processing [46], predictive control [7, 8] and system identification [9, 10]. For decades, many approaches have been studied on the system identification and parameter estimation for linear or nonlinear dynamics systems. The approaches not only can be applied to obtain the mathematical models of the systems [11] but also play an important role in analyzing the controlled dynamics systems [12, 13]. For example, Hagenblad et al. [14] derived a maximum likelihood identification method for Wiener models. By the key-term separation principle, Vörös [15] solved the parameter identification problem of nonlinear dynamic systems with both actuator and sensor nonlinearities using three-block cascade models. Based on the least squares principle, Hu et al. [16] derived a recursive extended least squares algorithm for identifying Wiener nonlinear moving average systems. By using the polynomial nonlinear state space approach, Paduart et al. [17] identified a nonlinear system with a Wiener–Hammerstein structure. By using the maximum likelihood method, Sun and Liu [18] offered an APSO-aided identification algorithm to identify Hammerstein systems.

The model decomposition technique can be used to separate a large-scale system into several subsystems with small sizes and to enhance the computational efficiency [19, 20]. Recently, Zhang [21] proposed a three-stage least squares iterative identification algorithm for output error moving average systems using the model decomposition; Bai and Liu [22] presented a normalized iterative method to find a least squares solution for a system of bilinear equations by using the decomposition technique; Wang and Ding [23] separated a bilinear-parameter cost function into two linear-parameter cost functions and derived a least squares-based and a gradient-based iterative identification algorithms for Wiener nonlinear systems.

The filtering technique has been proved to be effective in parameter estimation [24, 25] and state estimation [26]. Recently, Zhao et al. [27] studied a maximum likelihood method to obtain the parameter estimation of the batch processes by employing the particle filtering approach; Ding et al. [28] used the filtering technique to derive a recursive least squares parameter identification algorithm for systems with colored noise; Wang and Tang [29] presented a filtered three-stage gradient-based iterative algorithm for a class of linear-in-parameters output error autoregressive systems by using the model decomposition and the data filtering technique.

By extending the methods in [21, 30] from the linear systems to an input nonlinear output error autoregressive (IN-OEAR) system, this paper studies its iterative identification problem. The objective is to decompose a bilinear-parameter system into two fictitious subsystems by using the model decomposition and to present a least squares-based iterative algorithm for the IN-OEAR system. Furthermore, using an estimated noise transfer function to filter the input–output data of the system to be identified, a data filtering-based least squares iterative algorithm is presented. Compared with the least squares iterative algorithm, the filtering-based least squares iterative algorithm can achieve higher estimation accuracy. The proposed algorithms differ from the least squares or gradient-based iterative algorithms for Hammerstein nonlinear ARMAX systems using the over-parameterization method in [31].

Briefly, the rest of this paper is organized as follows: Section 2 gives the identification model of the IN-OEAR systems. Section 3 presents a least squares iterative identification algorithm by using the model decomposition. Section 4 derives a filtering-based least squares iterative identification algorithm for the IN-OEAR systems. A numerical example is provided in Sect. 6 to show the effectiveness of the proposed algorithms. Finally, we give some concluding remarks in Sect. 7.

2 System description

The typical block-oriented nonlinear models include Hammerstein models (a nonlinear static block followed by a dynamics linear block), Wiener models (a linear dynamics block followed by a static nonlinear block), Hammerstein–Wiener models and Wiener–Hammerstein models. Here, we consider a Hammerstein nonlinear system with colored noise in Fig. 1,

$$\begin{aligned} y(t)= & {} \frac{B(z)}{A(z)}\bar{u}(t)+w(t), \end{aligned}$$
(1)
$$\begin{aligned} \bar{u}(t)= & {} f(u(t)), \end{aligned}$$
(2)

where y(t) is the measured output, w(t) is the disturbance with zero mean, u(t) and \(\bar{u}(t)\) are the input and output of the nonlinear block, respectively, and A(z) and B(z) are polynomials in the unit backward shift operator \(z^{-1} (z^{-1}y(t)=y(t-1)\)):

$$\begin{aligned} A(z):= & {} 1+a_1z^{-1}+a_2z^{-2}+\cdots +a_{n_a}z^{-n_a}, \\ B(z):= & {} b_1z^{-1}+b_2z^{-2}+\cdots +b_{n_b}z^{-n_b}. \end{aligned}$$
Fig. 1
figure 1

The Hammerstein nonlinear system with colored noise

Assume that the order \(n_a\) and \(n_b\) are known and \(y(t)=0\), \(u(t)=0\) and \(v(t)=0\) for \(t\leqslant 0\). The output of the nonlinear block is a linear combination of the known basic functions \(f_j(*)\) and unknown coefficients \(\alpha _i\):

$$\begin{aligned} \bar{u}(t)= & {} \alpha _1f_1(u(t))+\alpha _2f_2(u(t))+\cdots +\alpha _mf_m(u(t))\\= & {} \varvec{f}(u(t)){{\varvec{\alpha }}}, \end{aligned}$$

where

$$\begin{aligned}&{\varvec{\alpha }}:=\left[ \alpha _1, \alpha _2, \ldots , \alpha _m\right] ^{\mathrm{T}}\in {\mathbb R}^m,\\&\varvec{f}(u(t)):=\left[ f_1(u(t)),f_2(u(t)),\ldots ,f_m(u(t))\right] \in {\mathbb R}^{1\times m}. \end{aligned}$$

The basic functions \(f_j(*)\) can be the known order in the input or the trigonometric functions.

For the system with colored noise, the disturbance w(t) can be fitted by an autoregressive process

$$\begin{aligned} w(t)=\frac{1}{C(z)}v(t), \end{aligned}$$
(3)

or a moving average process

$$\begin{aligned} w(t)=D(z)v(t), \end{aligned}$$

or an autoregressive moving average process

$$\begin{aligned} w(t)=\frac{D(z)}{C(z)}v(t), \end{aligned}$$

where v(t) is the white noise with zero mean and variances \(\sigma ^2\), C(z) and D(z) are polynomials in the unit backward shift operator \(z^{-1}\) [32]:

$$\begin{aligned} C(z):= & {} 1+c_1z^{-1}+c_2z^{-2}+\cdots +c_{n_c}z^{-n_c},\\ D(z):= & {} 1+d_1z^{-1}+d_2z^{-2}+\cdots +d_{n_d}z^{-n_d}. \end{aligned}$$

This paper assumes the disturbance to be an autoregressive process, and the proposed algorithms can be extended to the other two cases.

Define an intermediate variable:

$$\begin{aligned} x(t):=\frac{B(z)}{A(z)}\bar{u}(t). \end{aligned}$$
(4)

Define the parameter vectors and the information vectors as

$$\begin{aligned}&\varvec{a}:=\left[ \begin{array}{c} a_1 \\ a_2 \\ \vdots \\ a_{n_a} \end{array}\right] \in {\mathbb R}^{n_a}, \ \ \varvec{b}:=\left[ \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_{n_b} \end{array}\right] \in {\mathbb R}^{n_b}, \\&\varvec{c}:=\left[ \begin{array}{c} c_1 \\ c_2 \\ \vdots \\ c_{n_c} \end{array}\right] \in {\mathbb R}^{n_c},\\&\varvec{F}(t):=\left[ \begin{array}{c} \varvec{f}(u(t-1)) \\ \varvec{f}(u(t-2)) \\ \vdots \\ \varvec{f}(u(t-n_b)) \end{array}\right] \in {\mathbb R}^{n_b\times m},\\&{\varvec{{\varphi }}}_\mathrm{a}(t):=\left[ -x(t-1),-x(t-2),\right. \\&\qquad \qquad \quad \left. \ldots ,-x(t-n_a)\right] ^{\mathrm{T}} \in {\mathbb R}^{n_a},\\&{\varvec{{\varphi }}}_\mathrm{n}(t):=\left[ -w(t-1),-w(t-2),\right. \\&\qquad \qquad \quad \left. \ldots ,-w(t-n_c)\right] ^{\mathrm{T}}\in {\mathbb R}^{n_c}. \end{aligned}$$

From (3) and (4), we have

$$\begin{aligned} w(t)= & {} [1-C(z)]w(t)+v(t)\nonumber \\= & {} -c_1w(t-1)-c_2w(t-2)-\cdots \nonumber \\&-\,c_{n_c}w(t-n_c)+v(t)\nonumber \\= & {} {\varvec{{\varphi }}}^{\mathrm{T}}_\mathrm{n}(t)\varvec{c}+v(t), \end{aligned}$$
(5)
$$\begin{aligned} x(t)= & {} [1-A(z)]x(t)+B(z)\bar{u}(t)\nonumber \\= & {} -a_1x(t-1)-a_2x(t-2)-\cdots \nonumber \\&-\,a_{n_a}x(t-n_a)+b_1\varvec{f}(u(t-1)){\varvec{\alpha }}\nonumber \\&+\,b_2\varvec{f}(u(t-2)){\varvec{\alpha }}\nonumber \\&+\cdots +b_{n_b}\varvec{f}(u(t-n_b)){\varvec{\alpha }}\nonumber \\= & {} {\varvec{{\varphi }}}_{\mathrm{a}}^{\mathrm{T}}(t)\varvec{a}+\varvec{b}^{\mathrm{T}}\varvec{F}(t){\varvec{\alpha }}. \end{aligned}$$
(6)

The output y(t) in (1) can be expressed as

$$\begin{aligned} y(t)= & {} x(t)+w(t) \end{aligned}$$
(7)
$$\begin{aligned}= & {} {\varvec{{\varphi }}}_\mathrm{a}^{\mathrm{T}}(t)\varvec{a}+\varvec{b}^{\mathrm{T}}\varvec{F}(t){\varvec{\alpha }}+{\varvec{{\varphi }}}^{\mathrm{T}}_\mathrm{n}(t)\varvec{c}+v(t). \end{aligned}$$
(8)

This is the identification model for the Hammerstein nonlinear system.

3 The decomposition-based least squares iterative algorithm

It is worth pointing out that model (8) contains the product of the parameters \(\varvec{b}\) of the linear part and \({\varvec{\alpha }}\) of the nonlinear part. The pair \(\beta \varvec{b}\) and \({\varvec{\alpha }}/\beta \) leads to the same input–output relation for any nonzero constant \(\beta \). In order to ensure identifiability, we assume that \(\Vert {\varvec{\alpha }}\Vert =1\) and the first entry of the vector \({\varvec{\alpha }}\) is positive, i.e., \(\alpha _1>0\). Although we can use the Kronecker product to transform the bilinear-parameter identification problem to a linear-parameter identification problem [33, 34], the dimension of the resulting unknown parameter vector increases, so does the calculation load. Here, we decompose this system into two fictitious subsystems: one containing the parameter vector \({\varvec{{\theta }}}:=\left[ \begin{array}{c} \varvec{a} \\ \varvec{b} \end{array} \right] \), and the other containing the parameter vector \({\varvec{{\vartheta }}}:=\left[ \begin{array}{c} {\varvec{\alpha }} \\ \varvec{c} \end{array} \right] \). Let \(k=1,2,3,\ldots \) be an iterative variable, \(\hat{{\varvec{{\theta }}}}_k(t):=\left[ \begin{array}{c} \hat{\varvec{a}}_k(t) \\ \hat{\varvec{b}}_k(t) \end{array} \right] \) and \(\hat{{\varvec{{\vartheta }}}}_k(t):=\left[ \begin{array}{c} \hat{{\varvec{\alpha }}}_k(t) \\ \hat{\varvec{c}}_k(t) \end{array} \right] \) be the estimates of \({\varvec{{\theta }}}\) and \({\varvec{{\vartheta }}}\) at iteration k. Define two fictitious outputs:

$$\begin{aligned} y_1(t):= & {} y(t)-{\varvec{{\varphi }}}^{\mathrm{T}}_\mathrm{n}(t)\varvec{c}\nonumber \\= & {} {\varvec{{\varphi }}}_\mathrm{a}^{\mathrm{T}}(t)\varvec{a}+\varvec{b}^{\mathrm{T}}\varvec{F}(t){\varvec{\alpha }}+v(t),\nonumber \\= & {} {\varvec{{\varphi }}}^{\mathrm{T}}_1(t){\varvec{{\theta }}}+v(t), \end{aligned}$$
(9)
$$\begin{aligned} y_2(t):= & {} y(t)-{\varvec{{\varphi }}}^{\mathrm{T}}_\mathrm{a}(t)\varvec{a}\nonumber \\= & {} \varvec{b}^{\mathrm{T}}\varvec{F}(t){\varvec{\alpha }}+{\varvec{{\varphi }}}^{\mathrm{T}}_\mathrm{n}(t)\varvec{c}+v(t),\nonumber \\= & {} {\varvec{{\varphi }}}^{\mathrm{T}}_2(t){\varvec{{\vartheta }}}+v(t), \end{aligned}$$
(10)

where

$$\begin{aligned} {\varvec{{\varphi }}}_1(t):= & {} \left[ \begin{array}{c} {\varvec{{\varphi }}}_\mathrm{a}(t) \\ \varvec{F}(t){\varvec{\alpha }} \end{array} \right] \in {\mathbb R}^{n_a+n_b},\\ {\varvec{{\varphi }}}_2(t):= & {} \left[ \begin{array}{c} \varvec{F}^{\mathrm{T}}(t)\varvec{b} \\ {\varvec{{\varphi }}}_\mathrm{n}(t) \end{array} \right] \in {\mathbb R}^{m+n_c}. \end{aligned}$$

Opt a set of data from \(j=t-L+1\) to \(j=t\) (L denotes the data length) and define two quadratic criterion functions:

$$\begin{aligned} J_1({\varvec{{\theta }}})= & {} \sum _{j=t-L+1}^t\left[ y_1(j)-{\varvec{{\varphi }}}^{\mathrm{T}}_1(j){\varvec{{\theta }}}\right] ^2,\\ J_2({\varvec{{\vartheta }}})= & {} \sum _{j=t-L+1}^t\left[ y_2(j)-{\varvec{{\varphi }}}^{\mathrm{T}}_2(j){\varvec{{\vartheta }}}\right] ^2. \end{aligned}$$

Based on the least squares principle, letting the partial derivative of \(J_1({\varvec{{\theta }}})\) and \(J_2({\varvec{{\vartheta }}})\) with respect to \({\varvec{{\theta }}}\) and \({\varvec{{\vartheta }}}\) be zero, respectively, we can obtain the following least squares iterative algorithm:

$$\begin{aligned} \hat{{\varvec{{\theta }}}}_k(t)= & {} \left[ \sum _{j=t-L+1}^t{\varvec{{\varphi }}}_1(j){\varvec{{\varphi }}}^{\mathrm{T}}_1(j)\right] ^{-1}\nonumber \\&\times \sum _{j=t-L+1}^t{\varvec{{\varphi }}}_1(j)y_1(j), \end{aligned}$$
(11)
$$\begin{aligned} \hat{{\varvec{{\vartheta }}}}_k(t)= & {} \left[ \sum _{j=t-L+1}^t{\varvec{{\varphi }}}_2(j){\varvec{{\varphi }}}^{\mathrm{T}}_2(j)\right] ^{-1}\nonumber \\&\times \sum _{j=t-L+1}^t{\varvec{{\varphi }}}_2(j)y_2(j). \end{aligned}$$
(12)

Substituting (9) into (11) and (10) into (12) gives

$$\begin{aligned} \hat{{\varvec{{\theta }}}}_k(t)= & {} \left[ \sum _{j=t-L+1}^t{\varvec{{\varphi }}}_1(j){\varvec{{\varphi }}}^{\mathrm{T}}_1(j)\right] ^{-1}\nonumber \\&\times \sum _{j=t-L+1}^t{\varvec{{\varphi }}}_1(j)[y(j)\nonumber \\&-\,{\varvec{{\varphi }}}^{\mathrm{T}}_\mathrm{n}(j)\varvec{c}], \end{aligned}$$
(13)
$$\begin{aligned} \hat{{\varvec{{\vartheta }}}}_k(t)= & {} \left[ \sum _{j=t-L+1}^t{\varvec{{\varphi }}}_2(j){\varvec{{\varphi }}}^{\mathrm{T}}_2(j)\right] ^{-1}\nonumber \\&\times \sum _{j=t-L+1}^t{\varvec{{\varphi }}}_2(j)[y(j)\nonumber \\&-\,{\varvec{{\varphi }}}^{\mathrm{T}}_\mathrm{a}(j)\varvec{a}]. \end{aligned}$$
(14)

The difficulty is that the right-hand sides of (13) and (14) contain the unknown parameter vectors \(\varvec{c}\) and \(\varvec{a}\), the information vectors \({\varvec{{\varphi }}}_1(t)\) and \({\varvec{{\varphi }}}_2(t)\) contain the unknown parameter vectors \({\varvec{\alpha }}\) and \(\varvec{b}\) and the unknown intermediate variables \(x(t-i)\) and \(w(t-i)\), so it is impossible to compute \(\hat{{\varvec{{\theta }}}}_k(t)\) and \(\hat{{\varvec{{\vartheta }}}}_k(t)\) by (13) and (14) directly. Here, the solution is based on the hierarchical identification principle [35]. Let \(\hat{w}_k(t-i)\) and \(\hat{x}_k(t-i)\) be the estimates of \(w(t-i)\) and \(x(t-i)\) at iteration k, \(\hat{{\varvec{{\varphi }}}}_{\mathrm{a},k}(t)\), \(\hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(t)\), \(\hat{{\varvec{{\varphi }}}}_{1,k}(t)\), and \(\hat{{\varvec{{\varphi }}}}_{2,k}(t)\) be the estimates of \({\varvec{{\varphi }}}_{\mathrm{a}}(t)\), \({\varvec{{\varphi }}}_{\mathrm{n}}(t)\), \({\varvec{{\varphi }}}_1(t)\), and \({\varvec{{\varphi }}}_2(t)\) at iteration k and define

$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{\mathrm{a},k}(t):= & {} \left[ -\hat{x}_{k-1}(t-1),-\hat{x}_{k-1}(t-2),\ldots ,\right. \\&\left. -\,\hat{x}_{k-1}(t-n_a)\right] ^{\mathrm{T}}\in {\mathbb R}^{n_a},\\ \hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(t):= & {} \left[ -\hat{w}_{k-1}(t-1),-\hat{w}_{k-1}(t-2),\ldots ,\right. \\&\left. -\,\hat{w}_{k-1}(t-n_c)\right] ^{\mathrm{T}}\in {\mathbb R}^{n_c},\\ \hat{{\varvec{{\varphi }}}}_{1,k}(t):= & {} \left[ \begin{array}{c} \hat{{\varvec{{\varphi }}}}_{\mathrm{a},k}(t) \\ \varvec{F}(t)\hat{{\varvec{\alpha }}}_{k-1}(t) \end{array} \right] \in {\mathbb R}^{n_a+n_b},\\ \hat{{\varvec{{\varphi }}}}_{2,k}(t):= & {} \left[ \begin{array}{c} \varvec{F}^{\mathrm{T}}(t)\hat{\varvec{b}}_k(t) \\ \hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(t) \end{array} \right] \in {\mathbb R}^{m+n_c}. \end{aligned}$$

From (6), we have \(x_k(t-i)={\varvec{{\varphi }}}^{\mathrm{T}}_{\mathrm{a}}(t-i)\varvec{a}+\varvec{b}^{\mathrm{T}}\varvec{F}(t-i){\varvec{\alpha }}\). Replacing \({\varvec{{\varphi }}}_{\mathrm{a}}(t-i)\), \(\varvec{a}\), \(\varvec{b}\) and \({\varvec{\alpha }}\) with their estimates \(\hat{{\varvec{{\varphi }}}}_{\mathrm{a},k}(t-i)\), \(\hat{\varvec{a}}_k(t)\), \(\hat{\varvec{b}}_k(t)\) and \(\hat{{\varvec{\alpha }}}_k(t)\) gives

$$\begin{aligned} \hat{x}_k(t-i)=\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{a},k}(t-i)\hat{\varvec{a}}_k(t)+\hat{\varvec{b}}^{\mathrm{T}}_k(t)\varvec{F}(t-i)\hat{{\varvec{\alpha }}}_k(t). \end{aligned}$$

From (7), we have \(w(t-i)=y(t-i)-x(t-i)\). Replacing \(x(t-i)\) with \(\hat{x}_k(t-i)\), we can compute the estimate of w(t) through:

$$\begin{aligned} \hat{w}_k(t-i)=y(t-i)-\hat{x}_k(t-i). \end{aligned}$$

Replacing the unknown \(\varvec{c}\) and \({\varvec{{\varphi }}}_1(t)\) in (13) with their estimates \(\hat{\varvec{c}}_{k-1}(t)\) and \(\hat{{\varvec{{\varphi }}}}_{1,k}(t)\), the unknown \(\varvec{a}\), \({\varvec{{\varphi }}}_2(t)\) and \({\varvec{{\varphi }}}_{\mathrm{a}}(t)\) in (14) with their estimates \(\hat{\varvec{a}}_k(t)\), \(\hat{{\varvec{{\varphi }}}}_{2,k}(t)\) and \(\hat{{\varvec{{\varphi }}}}_{\mathrm{a},k}(t)\), we can summarize the decomposition-based least squares iterative (D-LSI) algorithm for estimating \({\varvec{{\theta }}}\) and \({\varvec{{\vartheta }}}\) as follows:

$$\begin{aligned} \hat{{\varvec{{\theta }}}}_k(t)= & {} \left[ \sum _{j=t-L+1}^t \hat{{\varvec{{\varphi }}}}_{1,k}(j)\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{1,k}(j) \right] ^{-1}\nonumber \\&\times \sum _{j=t-L+1}^t \hat{{\varvec{{\varphi }}}}_{1,k}(j)\left[ y(j)-\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{n},k}(j)\hat{\varvec{c}}_{k-1}(j)\right] ,\end{aligned}$$
(15)
$$\begin{aligned} \hat{{\varvec{{\vartheta }}}}_k(t)= & {} \left[ \sum _{j=t-L+1}^t \hat{{\varvec{{\varphi }}}}_{2,k}(j)\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{2,k}(j) \right] ^{-1}\nonumber \\&\times \sum _{j=t-L+1}^t \hat{{\varvec{{\varphi }}}}_{2,k}(j)\left[ y(j)-\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{a},k}(j)\hat{\varvec{a}}_k(j)\right] ,\nonumber \\ \end{aligned}$$
(16)
$$\begin{aligned}&\hat{{\varvec{{\varphi }}}}_{\mathrm{a},k}(t)=\left[ -\hat{x}_{k-1}(t-1),-\hat{x}_{k-1}(t-2),\ldots ,\right. \nonumber \\&\qquad \qquad \quad \left. -\,\hat{x}_{k-1}(t-n_a)\right] ^{\mathrm{T}}, \end{aligned}$$
(17)
$$\begin{aligned}&\hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(t)=\left[ -\hat{w}_{k-1}(t-1),-\hat{w}_{k-1}(t-2),\ldots ,\right. \nonumber \\&\qquad \qquad \quad \left. -\,\hat{w}_{k-1}(t-n_c)\right] ^{\mathrm{T}}, \end{aligned}$$
(18)
$$\begin{aligned}&\hat{{\varvec{{\varphi }}}}_{1,k}(t)=\left[ \hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{a},k}(t),\hat{{\varvec{\alpha }}}^{\mathrm{T}}_{k-1}(t)\varvec{F}^{\mathrm{T}}(t)\right] ^{\mathrm{T}}, \end{aligned}$$
(19)
$$\begin{aligned}&\hat{{\varvec{{\varphi }}}}_{2,k}(t)=\left[ \hat{\varvec{b}}^{\mathrm{T}}_k(t)\varvec{F}(t),\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{n},k}(t)\right] ^{\mathrm{T}}, \end{aligned}$$
(20)
$$\begin{aligned}&\hat{x}_k(t-i)=\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{a},k}(t-i)\hat{\varvec{a}}_k(t)+\hat{\varvec{b}}^{\mathrm{T}}_k(t)\varvec{F}(t-i)\hat{{\varvec{\alpha }}}_k(t),\nonumber \\&\qquad \qquad \,\qquad i=1,2,\ldots ,n_a \end{aligned}$$
(21)
$$\begin{aligned}&\hat{w}_k(t-j)=y(t-j)-\hat{x}_k(t-j),\ \ j=1,2,\ldots ,n_c\nonumber \\ \end{aligned}$$
(22)
$$\begin{aligned} \varvec{F}(t)= & {} \left[ \begin{array}{cccc}f_1(u(t-1))&{}f_2(u(t-1)) &{} \cdots &{} f_m(u(t-1)) \\ f_1(u(t-2))&{}f_2(u(t-2)) &{} \cdots &{} f_m(u(t-2)) \\ \vdots &{}\vdots &{} &{} \vdots \\ f_1(u(t-n_b))&{}f_2(u(t-n_b)) &{}\cdots &{}f_m(u(t-n_b)) \end{array} \right] ,\nonumber \\ \end{aligned}$$
(23)
$$\begin{aligned} \hat{\varvec{a}}_k(t)= & {} \hat{{\varvec{{\theta }}}}_k(t)(1:n_a), \end{aligned}$$
(24)
$$\begin{aligned} \hat{\varvec{b}}_k(t)= & {} \hat{{\varvec{{\theta }}}}_k(t)(n_a+1:n_a+n_b), \end{aligned}$$
(25)
$$\begin{aligned} \hat{{\varvec{\alpha }}}_k(t)= & {} \mathrm{{sgn}}\left[ \hat{{\varvec{{\vartheta }}}}_k(t)(1)\right] \frac{\hat{{\varvec{{\vartheta }}}}_k(t)(1:m)}{\Vert \hat{{\varvec{{\vartheta }}}}_k(t)(1:m)\Vert }. \end{aligned}$$
(26)

To initialize the D-LSI algorithm, the initial value \(\hat{{\varvec{{\theta }}}}_0(t)=\left[ \begin{array}{c} \hat{\varvec{a}}_0(t) \\ \hat{\varvec{b}}_0(t) \end{array} \right] \) is generally taken to be a nonzero vector with \(\hat{\varvec{b}}_0(t)\ne 0\), \(\hat{{\varvec{\alpha }}}_0(t)\) is taken to be a vector with \(\Vert \hat{{\varvec{\alpha }}}_0(t)\Vert =1\), and \(\hat{\varvec{c}}_0(t)\) is taken to be an arbitrary real vector. The initial value of intermediate variables \(\hat{w}_0(t-i)\) and \(\hat{x}_0(t-i)\) is taken to be two random numbers.

4 The filtering-based least squares iterative algorithm

Using the polynomial C(z) (a linear filter) to filter the input–output data, the model in (1) can be transformed into two identification models: an input nonlinear output error model with white noise and an autoregressive noise model. Multiplying both sides of Eq. (1) by C(z) yields

$$\begin{aligned} C(z)y(t)=\frac{B(z)}{A(z)}C(z)\bar{u}(t)+v(t). \end{aligned}$$
(27)

Define the filtered output \(y_\mathrm{f}(t)\) and input \(\bar{u}_\mathrm{f}(t)\):

$$\begin{aligned} y_\mathrm{f}(t):= & {} C(z)y(t)\\= & {} y(t)+c_1y(t-1)+c_2y(t-2)\\&+\cdots +c_ny(t-n),\\ \bar{u}_\mathrm{f}(t):= & {} C(z)\bar{u}(t)\\= & {} C(z)[\alpha _1f_1(u(t))+\alpha _2f_2(u(t))\\&+\cdots +\alpha _mf_m(u(t))]\\= & {} \alpha _1 g_1(t)+\alpha _2 g_2(t)+\cdots +\alpha _m g_m(t), \end{aligned}$$

where

$$\begin{aligned} g_j(t):=C(z)f_j(u(t)), \ \ j=1,2,\ldots ,m. \end{aligned}$$

Define an information matrix:

$$\begin{aligned} \varvec{G}(t):= & {} \left[ \begin{array}{cccc}g_1(t-1) &{} g_2(t-1) &{} \cdots &{} g_m(t-1)\\ g_1(t-2) &{} g_2(t-2) &{} \cdots &{} g_m(t-2)\\ \vdots &{} \vdots &{} &{} \vdots \\ g_1(t-n_b) &{} g_2(t-n_b) &{} \cdots &{} g_m(t-n_b)\\ \end{array}\right] \\&\in {\mathbb R}^{n_b\times m}. \end{aligned}$$

Then, Eq. (27) can be rewritten as

$$\begin{aligned} y_\mathrm{f}(t)=\frac{B(z)}{A(z)}\bar{u}_\mathrm{f}(t)+v(t). \end{aligned}$$

Define an intermediate variable:

$$\begin{aligned} x_\mathrm{f}(t):=\frac{B(z)}{A(z)}\bar{u}_\mathrm{f}(t). \end{aligned}$$

Then, we have

$$\begin{aligned} x_\mathrm{f}(t)= & {} [1-A(z)]x_\mathrm{f}(t)+B(z)\bar{u}_\mathrm{f}(t)\nonumber \\= & {} {\varvec{{\varphi }}}^{\mathrm{T}}_{\mathrm{f}}(t)\varvec{a}+\varvec{b}^{\mathrm{T}}\varvec{G}(t){\varvec{\alpha }}, \end{aligned}$$
(28)

where

$$\begin{aligned} {\varvec{{\varphi }}}_\mathrm{f}(t):= & {} \left[ -x_\mathrm{f}(t-1),-x_\mathrm{f}(t-2),\ldots ,-x_\mathrm{f}(t-n_a)\right] ^{\mathrm{T}}\\&\in {\mathbb R}^{n_a}. \end{aligned}$$

The filtered output \(y_\mathrm{f}(t)\) can be expressed as

$$\begin{aligned} y_\mathrm{f}(t)= & {} x_\mathrm{f}(t)+v(t)\nonumber \\= & {} {\varvec{{\varphi }}}^{\mathrm{T}}_\mathrm{f}(t)\varvec{a}+\varvec{b}^{\mathrm{T}}\varvec{G}(t){\varvec{\alpha }}+v(t). \end{aligned}$$
(29)

Define a fictitious output and two quadratic criterion functions as

$$\begin{aligned} y_3(t):= & {} y_\mathrm{f}(t)-{\varvec{{\varphi }}}^{\mathrm{T}}_\mathrm{f}(t)\varvec{a}\\= & {} \varvec{b}^{\mathrm{T}}\varvec{G}(t){\varvec{\alpha }}+v(t), \\ J_3({\varvec{{\theta }}}):= & {} \sum _{j=t-L+1}^{t}\left[ y_\mathrm{f}(j)-{\varvec{{\varphi }}}^{\mathrm{T}}_3(j){\varvec{{\theta }}}\right] ^2,\\ J_4({\varvec{\alpha }}):= & {} \sum _{j=t-L+1}^{t}\left[ y_3(j)-{\varvec{{\varphi }}}^{\mathrm{T}}_4(j){\varvec{\alpha }}\right] ^2, \end{aligned}$$

where

$$\begin{aligned} {\varvec{{\varphi }}}_3(t):= & {} \left[ \begin{array}{c} {\varvec{{\varphi }}}_\mathrm{f}(t) \\ \varvec{G}(t){\varvec{\alpha }} \end{array} \right] \in {\mathbb R}^{n_a+n_b}, \end{aligned}$$
(30)
$$\begin{aligned} {\varvec{{\varphi }}}_4(t):= & {} \varvec{G}^{\mathrm{T}}(t)\varvec{b}\in {\mathbb R}^m. \end{aligned}$$
(31)

Minimizing the criterion functions \(J_3({\varvec{{\theta }}})\) and \(J_4({\varvec{\alpha }})\), and letting the partial derivatives of \(J_3({\varvec{{\theta }}})\) and \(J_4({\varvec{\alpha }})\) with respect to \({\varvec{{\theta }}}\) and \({\varvec{\alpha }}\) be zero, respectively, give the following iterative algorithm to estimate \({\varvec{{\theta }}}\) and \({\varvec{\alpha }}\):

$$\begin{aligned} \hat{{\varvec{{\theta }}}}_k(t)= & {} \left[ \sum _{j=t-L+1}^t {\varvec{{\varphi }}}_3(j){\varvec{{\varphi }}}^{\mathrm{T}}_3(j) \right] ^{-1}\nonumber \\&\times \sum _{j=t-L+1}^t {\varvec{{\varphi }}}_3(j)y_\mathrm{f}(j), \end{aligned}$$
(32)
$$\begin{aligned} \hat{{\varvec{\alpha }}}_k(t)= & {} \left[ \sum _{j=t-L+1}^t {\varvec{{\varphi }}}_4(j){\varvec{{\varphi }}}^{\mathrm{T}}_4(j) \right] ^{-1}\nonumber \\&\times \sum _{j=t-L+1}^t {\varvec{{\varphi }}}_4(j)y_3(j)\nonumber \\= & {} \left[ \sum _{j=t-L+1}^t {\varvec{{\varphi }}}_4(j){\varvec{{\varphi }}}^{\mathrm{T}}_4(j) \right] ^{-1}\nonumber \\&\times \sum _{j=t-L+1}^t {\varvec{{\varphi }}}_4(j)\left[ y_\mathrm{f}(j)-{\varvec{{\varphi }}}^{\mathrm{T}}_\mathrm{f}(j)\varvec{a}\right] . \end{aligned}$$
(33)

However, the polynomial C(z) is unknown, so are the filtered output \(y_\mathrm{f}(t)\), the filtered input \(\bar{u}_\mathrm{f}(t)\) and the filtered information matrix \(\varvec{G}(t)\). Thus, it is impossible to obtain the estimates \(\hat{{\varvec{{\theta }}}}_k(t)\) and \(\hat{{\varvec{\alpha }}}_k(t)\) by (32) and (33). Here, we need to compute the parameter estimation vector \(\hat{\varvec{c}}_k(t)=[\hat{c}_{1,k}(t),\hat{c}_{2,k}(t),\ldots ,\hat{c}_{n_c,k}(t)]^{\mathrm{T}}\) firstly and then use the estimated polynomial \(\hat{C}_k(t,z):=1+\hat{c}_{1,k}(t)z^{-1}+\hat{c}_{2,k}(t)z^{-2}+\cdots +\hat{c}_{n_c,k}(t)z^{-n_c}\) to filter y(t), \(\bar{u}(t)\) to obtain the estimates \(\hat{y}_{\mathrm{f},k}(t)\), \(\hat{\bar{u}}_{\mathrm{f},k}(t)\) and \(\hat{\varvec{G}}_k(t)\).

According to (5), define a quadratic criterion function:

$$\begin{aligned} J_5(\varvec{c}):=\sum _{j=t-L+1}^t\left[ w(j)-{\varvec{{\varphi }}}_\mathrm{n}(j)\varvec{c}\right] ^2. \end{aligned}$$

Minimizing the criterion functions \(J_5(\varvec{c})\) gives the iterative estimate of \(\varvec{c}\):

$$\begin{aligned} \hat{\varvec{c}}_k(t)= & {} \left[ \sum _{j=t-L+1}^t {\varvec{{\varphi }}}_\mathrm{n}(j){\varvec{{\varphi }}}^{\mathrm{T}}_\mathrm{n}(j)\right] ^{-1}\nonumber \\&\times \sum _{j=t-L+1}^t {\varvec{{\varphi }}}_\mathrm{n}(j)w(j). \end{aligned}$$
(34)

We can find that the right-hand side of (34) contains the unknown information vector \({\varvec{{\varphi }}}_\mathrm{n}(t)\) and intermediate variable w(t). Similarly, replacing the unknown \({\varvec{{\varphi }}}_\mathrm{n}(t)\) and w(t) in (34) with their corresponding estimates \(\hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(t)\) and \(\hat{w}_k(t)\), we can obtain the least squares iterative algorithm for computing the estimate \(\hat{\varvec{c}}_k(t)\) as follows:

$$\begin{aligned} \hat{\varvec{c}}_k(t)= & {} \left[ \sum _{j=t-L+1}^t \hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(j)\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{n},k}(j)\right] ^{-1} \nonumber \\&\times \sum _{j=t-L+1}^t \hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(j)\hat{w}_k(j), \end{aligned}$$
(35)
$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(t)= & {} \left[ -\hat{w}_{k-1}(t-1),-\hat{w}_{k-1}(t-2),\ldots ,\right. \nonumber \\&\left. -\,\hat{w}_{k-1}(t-n_c)\right] ^{\mathrm{T}}, \end{aligned}$$
(36)
$$\begin{aligned} \hat{w}_k(t-i)= & {} y(t-i)-\hat{{\varvec{{\varphi }}}}_{\mathrm{a},k}^{\mathrm{T}}(t-i)\hat{\varvec{a}}_{k-1}(t)\nonumber \\&-\,\hat{\varvec{b}}_{k-1}^{\mathrm{T}}\varvec{F}(t-i)\hat{{\varvec{\alpha }}}_{k-1}(t). \end{aligned}$$
(37)

Using the obtained estimate \(\hat{\varvec{c}}_k(t)\) to construct the polynomial

$$\begin{aligned} \hat{C}_k(t,z)= & {} 1+\hat{c}_{1,k}(t)z^{-1}+\hat{c}_{2,k}(t)z^{-2}\\&+\cdots +\hat{c}_{n_c,k}(t)z^{-n_c} \end{aligned}$$

to filter y(t) and \(\hat{\bar{u}}(t)\) gives the filtered estimates \(\hat{y}_{\mathrm{f},k}(t)\) and \(\hat{\bar{u}}_{\mathrm{f},k}(t)\):

$$\begin{aligned} \hat{y}_{\mathrm{f},k}(t)= & {} \hat{C}_k(t,z)y(t)\\= & {} y(t)+\hat{c}_{1,k}(t)y(t-1)+\hat{c}_{2,k}(t)y(t-2)\\&+\cdots +\hat{c}_{n_c,k}(t)y(t-n_c),\\ \hat{\bar{u}}_{\mathrm{f},k}(t)= & {} \hat{C}_k(t,z)\hat{\bar{u}}(t)\\= & {} \hat{C}_k(t,z)\left[ \hat{\alpha }_{1,k}(t)f_1(u(t))+\hat{\alpha }_{2,k}(t)f_2(u(t))\right. \\&\left. +\,\cdots +\hat{\alpha }_{m,k}(t)f_m(u(t))\right] \\= & {} \hat{\alpha }_1(t)\hat{g}_{1,k}(t)+\hat{\alpha }_{2,k}(t)\hat{g}_{2,k}(t)\\&+\cdots +\hat{\alpha }_{m,k}(t)\hat{g}_{m,k}(t), \end{aligned}$$

where \(\hat{g}_{j,k}(t)\) can be computed by

$$\begin{aligned} \hat{g}_{j,k}(t)= & {} \hat{C}_k(t,z)f_j(u(t))\\= & {} f_j(u(t))+\hat{c}_{1,k}(t)f_j(u(t-1))\\&+\,\hat{c}_{2,k}(t)f_j(u(t-2))+\cdots \\&+\,\hat{c}_{n_c,k}(t)f_j(u(t-n_c)). \end{aligned}$$

Let \(\hat{{\varvec{{\varphi }}}}_{\mathrm{f},k}(t)\) be the estimate of \({\varvec{{\varphi }}}_\mathrm{f}(t)\) and define

$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{\mathrm{f},k}(t):= & {} \left[ -\hat{x}_{\mathrm{f},k-1}(t-1),-\hat{x}_{\mathrm{f},k-1}(t-2),\ldots ,\right. \\&\left. -\,\hat{x}_{\mathrm{f},k-1}(t-n_a)\right. ]\in {\mathbb R}^{n_a}. \end{aligned}$$

From (28), we have \(x_{\mathrm{f}}(t-i)={\varvec{{\varphi }}}^{\mathrm{T}}_{\mathrm{f}}(t-i)\varvec{a}+\varvec{b}^{\mathrm{T}}\varvec{G}(t-i){\varvec{\alpha }}\). Replacing the parameter vectors \(\varvec{a}\), \(\varvec{b}\) and \({\varvec{\alpha }}\) with their estimates \(\hat{\varvec{a}}_k(t)\), \(\hat{\varvec{b}}_k(t)\) and \(\hat{{\varvec{\alpha }}}_k(t)\) at iteration k and the unknown \({\varvec{{\varphi }}}_\mathrm{f}(t-i)\) and \(\varvec{G}(t-i)\) with their estimates \(\hat{{\varvec{{\varphi }}}}_{\mathrm{f},k}(t-i)\) and \(\hat{\varvec{G}}_k(t-i)\), respectively, the estimate \(\hat{x}_{\mathrm{f},k}(t-i)\) can be computed by

$$\begin{aligned} \hat{x}_{\mathrm{f},k}(t-i)=\hat{{\varvec{{\varphi }}}}_{\mathrm{f},k}(t-i)\hat{\varvec{a}}_k(t)+\hat{\varvec{b}}^{\mathrm{T}}_k(t)\hat{\varvec{G}}_k(t-i)\hat{{\varvec{\alpha }}}_k(t). \end{aligned}$$

According to (30) and (31), we define

$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{3,k}(t):= & {} \left[ \begin{array}{c} \hat{{\varvec{{\varphi }}}}_{\mathrm{f},k}(t) \\ \hat{\varvec{G}}_k(t)\hat{{\varvec{\alpha }}}_{k-1}(t) \end{array} \right] \in {\mathbb R}^{n_a+n_b},\\ \hat{{\varvec{{\varphi }}}}_{4,k}(t):= & {} \hat{\varvec{G}}^{\mathrm{T}}_k(t)\hat{\varvec{b}}_k(t)\in {\mathbb R}^m. \end{aligned}$$

Replacing \({\varvec{{\varphi }}}_3(t)\) and \(y_{\mathrm{f}}(t)\) in (32) with their estimates \(\hat{{\varvec{{\varphi }}}}_{3,k}(t)\) and \(\hat{y}_\mathrm{f}(t)\), replacing \({\varvec{{\varphi }}}_4(t)\), \(y_{\mathrm{f}}(t)\) and \({\varvec{{\varphi }}}_{\mathrm{f}}(t)\) in (33) with their estimates \(\hat{{\varvec{{\varphi }}}}_{4,k}(t)\), \(\hat{y}_{\mathrm{f}}(t)\) and \(\hat{{\varvec{{\varphi }}}}_{\mathrm{f},k}(t)\), respectively, we can obtain the following data filtering-based least squares iterative algorithm by using the model decomposition technique (the F-D-LSI algorithm for short):

$$\begin{aligned} \hat{{\varvec{{\theta }}}}_k(t)= & {} \left[ \sum _{j=t-L+1}^t \hat{{\varvec{{\varphi }}}}_{3,k}(j)\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{3,k}(j) \right] ^{-1}\nonumber \\&\times \sum _{j=t-L+1}^t \hat{{\varvec{{\varphi }}}}_{3,k}(j)\hat{y}_{\mathrm{f},k}(j), \end{aligned}$$
(38)
$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{3,k}(t)= & {} \left[ \hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{f},k}(t),\hat{{\varvec{\alpha }}}^{\mathrm{T}}_{k-1}(t)\hat{\varvec{G}}^{\mathrm{T}}_k(t)\right] ^{\mathrm{T}}, \end{aligned}$$
(39)
$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{\mathrm{f},k}(t)= & {} \left[ -\hat{x}_{\mathrm{f},k-1}(t-1),-\hat{x}_{\mathrm{f},k-1}(t-2),\ldots ,\right. \nonumber \\&\left. -\,\hat{x}_{\mathrm{f},k-1}(t-n_a)\right] ^{\mathrm{T}},\ \ i=1,2,\ldots ,n_a, \end{aligned}$$
(40)
$$\begin{aligned} \hat{x}_{\mathrm{f},k}(t-i)= & {} \hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{f},k}(t-i)\hat{\varvec{a}}_k(t)+\hat{\varvec{b}}^{\mathrm{T}}_k(t)\hat{\varvec{G}}_k(t-i)\hat{{\varvec{\alpha }}}_k(t),\nonumber \\ \end{aligned}$$
(41)
$$\begin{aligned} \hat{{\varvec{\alpha }}}_k(t)= & {} \left[ \sum _{j=t-L+1}^t \hat{{\varvec{{\varphi }}}}_{4,k}(j)\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{4,k}(j) \right] ^{-1}\nonumber \\&\times \sum _{j=t-L+1}^t \hat{{\varvec{{\varphi }}}}_{4,k}(j)\left[ \hat{y}_{\mathrm{f},k}(j)-\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{f},k}(j)\hat{\varvec{a}}_k(j)\right] ,\nonumber \\ \end{aligned}$$
(42)
$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{4,k}(t)= & {} \hat{\varvec{G}}^{\mathrm{T}}_k(t)\hat{\varvec{b}}_k(t), \end{aligned}$$
(43)
$$\begin{aligned} \hat{\varvec{G}}_k(t)= & {} \left[ \begin{array}{cccc}\hat{g}_{1,k}(t-1) &{} \hat{g}_{2,k}(t-1) &{} \cdots &{} \hat{g}_{m,k}(t-1)\\ \hat{g}_{1,k}(t-2) &{} \hat{g}_{2,k}(t-2) &{} \cdots &{} \hat{g}_{m,k}(t-2)\\ \vdots &{} \vdots &{} &{} \vdots \\ \hat{g}_{1,k}(t-n_b) &{} \hat{g}_{2,k}(t-n_b) &{} \cdots &{} \hat{g}_{m,k}(t-n_b)\\ \end{array}\right] ,\nonumber \\ \end{aligned}$$
(44)
$$\begin{aligned} \hat{g}_{j,k}(t)= & {} f_j(u(t))+\hat{c}_{1,k}(t)f_j(u(t-1))\nonumber \\&+\,\hat{c}_{2,k}(t)f_j(u(t-2)) +\cdots \nonumber \\&+\,\hat{c}_{n_c,k}(t)f_j(u(t-n_c)), \end{aligned}$$
(45)
$$\begin{aligned} \hat{y}_{\mathrm{f},k}(t)= & {} y(t)+\hat{c}_{1,k}(t)y(t-1)+\hat{c}_{2,k}(t)y(t-2)\nonumber \\&+\cdots +\hat{c}_{n_c,k}(t)y(t-n_c), \end{aligned}$$
(46)
$$\begin{aligned} \hat{\varvec{c}}_k(t)= & {} \left[ \sum _{j=t-L+1}^t \hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(j)\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{n},k}(j)\right] ^{-1}\nonumber \\&\times \sum _{j=t-L+1}^t \hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(j)\hat{w}_k(j), \end{aligned}$$
(47)
$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(t)= & {} \left[ -\hat{w}_{k-1}(t-1),-\hat{w}_{k-1}(t-2),\ldots ,\right. \nonumber \\&\left. -\,\hat{w}_{k-1}(t-n_c)\right] ^{\mathrm{T}}, \end{aligned}$$
(48)
$$\begin{aligned} \hat{w}_k(t-j)= & {} y(t-j)-\hat{{\varvec{{\varphi }}}}_{\mathrm{a},k}^{\mathrm{T}}(t-j)\hat{\varvec{a}}_{k-1}(t)\nonumber \\&-\,\hat{\varvec{b}}_{k-1}^{\mathrm{T}}(t)\varvec{F}(t-j)\hat{{\varvec{\alpha }}}_{k-1}(t),\nonumber \\&\ j=1,2,\ldots ,n_c \end{aligned}$$
(49)
$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{\mathrm{a},k}(t)= & {} \left[ -\hat{x}_{k-1}(t-1),-\hat{x}_{k-1}(t-2),\ldots ,\right. \nonumber \\&\left. -\,\hat{x}_{k-1}(t-n_a)\right] ^{\mathrm{T}}, \end{aligned}$$
(50)
$$\begin{aligned} \hat{x}_k(t-i)= & {} \left[ \hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{a},k}(t-i),\hat{{\varvec{\alpha }}}^{\mathrm{T}}_k(t)\varvec{F}^{\mathrm{T}}(t-i)\right] \hat{{\varvec{{\theta }}}}_k(t),\nonumber \\&i=1,2,\ldots ,n_a, \end{aligned}$$
(51)
$$\begin{aligned} \varvec{F}(t)= & {} \left[ \begin{array}{cccc}f_1(u(t-1))&{}f_2(u(t-1)) &{} \cdots &{} f_m(u(t-1)) \\ f_1(u(t-2))&{}f_2(u(t-2)) &{} \cdots &{} f_m(u(t-2)) \\ \vdots &{}\vdots &{} &{} \vdots \\ f_1(u(t-n_b))&{}f_2(u(t-n_b)) &{}\cdots &{}f_m(u(t-n_b)) \end{array} \right] ,\nonumber \\ \end{aligned}$$
(52)
$$\begin{aligned} \hat{\varvec{a}}_k(t)= & {} \hat{{\varvec{{\theta }}}}_k(t)(1:n_a), \end{aligned}$$
(53)
$$\begin{aligned} \hat{\varvec{b}}_k(t)= & {} \hat{{\varvec{{\theta }}}}_k(t)(na+1:n_a+n_b), \end{aligned}$$
(54)
$$\begin{aligned} \bar{{\varvec{\alpha }}}_k(t)= & {} \mathrm{{sgn}}[\hat{{\varvec{\alpha }}}_k(t)(1)]\frac{\hat{{\varvec{\alpha }}}_k(t)}{\Vert \hat{{\varvec{\alpha }}}_k(t)\Vert }, \ \ \hat{{\varvec{\alpha }}}_k(t)=\bar{{\varvec{\alpha }}}_k(t), \end{aligned}$$
(55)
$$\begin{aligned} \hat{\varvec{c}}_{k}(t)= & {} \left[ \hat{c}_{1,k}(t), \hat{c}_{2,k}(t),\ldots ,\hat{c}_{n_c,k}(t)\right] ^{\mathrm{T}}, \end{aligned}$$
(56)
$$\begin{aligned} \hat{{\varvec{\varTheta }}}_k(t)= & {} \left[ \hat{{\varvec{{\theta }}}}^{\mathrm{T}}_k(t),\hat{{\varvec{\alpha }}}^{\mathrm{T}}_k(t),\hat{\varvec{c}}^{\mathrm{T}}_k(t)\right] ^{\mathrm{T}}. \end{aligned}$$
(57)

To initialize the F-D-LSI algorithm: let \(k=1\) and set the initial values: \(\hat{{\varvec{{\theta }}}}_0(t)=\left[ \begin{array}{c} \hat{\varvec{a}}_0(t) \\ \hat{\varvec{b}}_0(t) \end{array} \right] \) be any nonzero real vector with \(\hat{\varvec{b}}_0(t)\ne 0\), \(\hat{{\varvec{\alpha }}}_0(t)\) be an real vector with \(\Vert \hat{{\varvec{\alpha }}}_0(t)\Vert =1\), \(\hat{\varvec{c}}_0(t)\) be an arbitrary real vector, \(\hat{x}_{\mathrm{f},0}(t-i)\), \(\hat{x}_0(t-i)\) and \(\hat{w}_0(t-i)\) are random numbers, \(\hat{y}_{\mathrm{f},0}(t-i)=1/p_0\), \(p_0\) is taken to be a large number, for example \(p_0=10^6\). The flowchart of the F-D-LSI algorithm for computing \(\hat{{\varvec{{\theta }}}}_k(t)\), \( \hat{{\varvec{\alpha }}}_k(t)\) and \(\hat{\varvec{c}}_k(t)\) is shown in Fig. 2.

Fig. 2
figure 2

The flowchart of the F-D-LSI algorithm for computing \(\hat{{\varvec{{\theta }}}}_k(t)\), \( \hat{{\varvec{\alpha }}}_k(t)\) and \(\hat{\varvec{c}}_k(t)\)

5 The F-D-LSI algorithm with finite measurement data

On the basis of the F-D-LSI algorithm, this section simply gives the data filtering-based least squares iterative algorithm with finite measurement data. Letting \(t=L\), from \(J_3({\varvec{{\theta }}})\), \(J_4({\varvec{\alpha }})\) and \(J_5(\varvec{c})\), we have

$$\begin{aligned} J_6({\varvec{{\theta }}}):= & {} \sum _{j=1}^L[y_{\mathrm{f}}(j)-{\varvec{{\varphi }}}_3(j){\varvec{{\theta }}}]^2,\\ J_7({\varvec{\alpha }}):= & {} \sum _{j=1}^{L}[y_\mathrm{f}(t)-{\varvec{{\varphi }}}^{\mathrm{T}}_\mathrm{f}(t)\varvec{a}-{\varvec{{\varphi }}}^{\mathrm{T}}_4(j){\varvec{\alpha }}]^2,\\ J_8(\varvec{c}):= & {} \sum _{j=1}^L[w(j)-{\varvec{{\varphi }}}_\mathrm{n}(j)\varvec{c}]^2. \end{aligned}$$

Applying the similar way of deriving the F-D-LSI algorithm and minimizing the criterion functions \(J_6({\varvec{{\theta }}})\), \(J_7({\varvec{\alpha }})\) and \(J_8(\varvec{c})\), we can obtain the F-D-LSI algorithm with finite measurement data for estimating \(\hat{{\varvec{{\theta }}}}_k\), \(\hat{{\varvec{\alpha }}}_k\) and \(\hat{\varvec{c}}_k\) as follows:

$$\begin{aligned} \hat{{\varvec{{\theta }}}}_k= & {} \left[ \sum _{j=1}^L \hat{{\varvec{{\varphi }}}}_{3,k}(j)\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{3,k}(j) \right] ^{-1}\nonumber \\&\times \sum _{j=1}^L \hat{{\varvec{{\varphi }}}}_{3,k}(j)\hat{y}_{\mathrm{f},k}(j), \end{aligned}$$
(58)
$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{3,k}(t)= & {} [\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{f},k}(t),\hat{{\varvec{\alpha }}}^{\mathrm{T}}_{k-1}\hat{\varvec{G}}^{\mathrm{T}}_k(t)]^{\mathrm{T}},\ \ t=1,2,\ldots ,L,\nonumber \\ \end{aligned}$$
(59)
$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{\mathrm{f},k}(t)= & {} [-\hat{x}_{\mathrm{f},k-1}(t-1),-\hat{x}_{\mathrm{f},k-1}(t-2),\ldots ,\nonumber \\&-\,\hat{x}_{\mathrm{f},k-1}(t-n_a)]^{\mathrm{T}}, \end{aligned}$$
(60)
$$\begin{aligned} \hat{x}_{\mathrm{f},k}(t)= & {} \hat{{\varvec{{\varphi }}}}_{\mathrm{f},k}(t)\hat{\varvec{a}}_k+\hat{\varvec{b}}^{\mathrm{T}}_k\hat{\varvec{G}}_k(t)\hat{{\varvec{\alpha }}}_k, \end{aligned}$$
(61)
$$\begin{aligned} \hat{{\varvec{\alpha }}}_k= & {} \left[ \sum _{j=1}^L \hat{{\varvec{{\varphi }}}}_{4,k}(j)\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{4,k}(j) \right] ^{-1}\nonumber \\&\times \sum _{j=1}^L \hat{{\varvec{{\varphi }}}}_{4,k}(j) [\hat{y}_{\mathrm{f},k}(j)-\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{f},k}(j)\hat{\varvec{a}}_k], \end{aligned}$$
(62)
$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{4,k}(t)= & {} \hat{\varvec{G}}^{\mathrm{T}}_k(t)\hat{\varvec{b}}_k, \end{aligned}$$
(63)
$$\begin{aligned} \hat{\varvec{G}}_k(t)= & {} \left[ \begin{array}{cccc}\hat{g}_{1,k}(t-1) &{} \hat{g}_{2,k}(t-1) &{} \cdots &{} \hat{g}_{m,k}(t-1)\\ \hat{g}_{1,k}(t-2) &{} \hat{g}_{2,k}(t-2) &{} \cdots &{} \hat{g}_{m,k}(t-2)\\ \vdots &{} \vdots &{} &{} \vdots \\ \hat{g}_{1,k}(t-n_b) &{} \hat{g}_{2,k}(t-n_b) &{} \cdots &{} \hat{g}_{m,k}(t-n_b)\\ \end{array}\right] ,\nonumber \\ \end{aligned}$$
(64)
$$\begin{aligned} \hat{g}_{j,k}(t)= & {} f_j(u(t))+\hat{c}_{1,k}f_j(u(t-1))\nonumber \\&+\,\hat{c}_{2,k}f_j(u(t-2))+\cdots \nonumber \\&+\,\hat{c}_{n_c,k}f_j(u(t-n_c)), \end{aligned}$$
(65)
$$\begin{aligned} \hat{y}_{\mathrm{f},k}(t)= & {} y(t)+\hat{c}_{1,k}y(t-1)+\hat{c}_{2,k}y(t-2)+\cdots \nonumber \\&+\,\hat{c}_{n_c,k}y(t-n_c), \end{aligned}$$
(66)
$$\begin{aligned} \hat{\varvec{c}}_k= & {} \left[ \sum _{j=1}^L \hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(j)\hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{n},k}(j)\right] ^{-1}\nonumber \\&\times \sum _{j=1}^L \hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(j)\hat{w}_k(j), \end{aligned}$$
(67)
$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{\mathrm{n},k}(t)= & {} \left[ -\hat{w}_{k-1}(t-1),-\hat{w}_{k-1}(t-2),\ldots ,\right. \nonumber \\&\left. -\,\hat{w}_{k-1}(t-n_c)\right] ^{\mathrm{T}}, \end{aligned}$$
(68)
$$\begin{aligned} \hat{w}_k(t)= & {} y(t)-\hat{{\varvec{{\varphi }}}}_{\mathrm{a},k}^{\mathrm{T}}(t)\hat{\varvec{a}}_{k-1}-\hat{\varvec{b}}_{k-1}^{\mathrm{T}}\varvec{F}(t)\hat{{\varvec{\alpha }}}_{k-1}, \end{aligned}$$
(69)
$$\begin{aligned} \hat{{\varvec{{\varphi }}}}_{\mathrm{a},k}(t)= & {} \left[ -\hat{x}_{k-1}(t-1),-\hat{x}_{k-1}(t-2),\ldots ,\right. \nonumber \\&\left. -\,\hat{x}_{k-1}(t-n_a)\right] ^{\mathrm{T}}, \end{aligned}$$
(70)
$$\begin{aligned} \hat{x}_k(t)= & {} \left[ \hat{{\varvec{{\varphi }}}}^{\mathrm{T}}_{\mathrm{a},k}(t),\hat{{\varvec{\alpha }}}^{\mathrm{T}}_k\varvec{F}^{\mathrm{T}}(t)\right] \hat{{\varvec{{\theta }}}}_k, \end{aligned}$$
(71)
$$\begin{aligned} \varvec{F}(t)= & {} \left[ \begin{array}{cccc}f_1(u(t-1))&{}f_2(u(t-1)) &{} \cdots &{} f_m(u(t-1)) \\ f_1(u(t-2))&{}f_2(u(t-2)) &{} \cdots &{} f_m(u(t-2)) \\ \vdots &{}\vdots &{} &{} \vdots \\ f_1(u(t-n_b))&{}f_2(u(t-n_b)) &{}\cdots &{}f_m(u(t-n_b)) \end{array} \right] ,\nonumber \\ \end{aligned}$$
(72)
$$\begin{aligned} \hat{\varvec{a}}_k= & {} \hat{{\varvec{{\theta }}}}_k(1:n_a), \end{aligned}$$
(73)
$$\begin{aligned} \hat{\varvec{b}}_k= & {} \hat{{\varvec{{\theta }}}}_k(1+n_a:n_a+n_b), \end{aligned}$$
(74)
$$\begin{aligned} \bar{{\varvec{\alpha }}}_k= & {} \mathrm{{sgn}}[\hat{{\varvec{\alpha }}}_k(1)]\frac{\hat{{\varvec{\alpha }}}_k}{\Vert \hat{{\varvec{\alpha }}}_k\Vert }, \ \ \hat{{\varvec{\alpha }}}_k=\bar{{\varvec{\alpha }}}_k, \end{aligned}$$
(75)
$$\begin{aligned} \hat{\varvec{c}}_{k}= & {} \left[ \hat{c}_{1,k}, \hat{c}_{2,k},\ldots ,\hat{c}_{n_c,k}\right] ^{\mathrm{T}}, \end{aligned}$$
(76)
$$\begin{aligned} \hat{{\varvec{\varTheta }}}_k= & {} [\hat{{\varvec{{\theta }}}}^{\mathrm{T}}_k,\hat{{\varvec{\alpha }}}^{\mathrm{T}}_k,\hat{\varvec{c}}^{\mathrm{T}}_k]^{\mathrm{T}}. \end{aligned}$$
(77)

The flowchart of computing the parameter estimate \(\hat{{\varvec{\varTheta }}}_k\) in the F-D-LSI algorithm in (58)–(77) with finite measurement data is shown in Fig. 3.

Fig. 3
figure 3

The flowchart of the F-D-LSI algorithm with finite measurement data for computing \(\hat{{\varvec{\varTheta }}}_k\)

Table 1 The parameter estimates and errors versus iteration k (\(\sigma ^2=0.50^2\), \(L=1000\))
Table 2 The parameter estimates and errors versus iteration k (\(\sigma ^2=0.50^2\), \(L=2000\))

The F-D-LSI algorithm can be used to identify input nonlinear systems (Hammerstein nonlinear systems). The typical example is the first-order water tank system in Fig. 4, where u(t) is the valve opening, \(\bar{u}(t)\) is the water inlet flow, and y(t) is the liquid level, the transfer function of the linear dynamical block has the form of \(\frac{b_1z^{-1}}{1+a_1z^{-1}}\). The nonlinearity of the valve can be approximately fitted by a polynomial or a linear combination of the known base functions, and the disturbance is an autoregressive process \(w(t):=\frac{1}{1+c_1z^{-1}}v(t)\), v(t) is white noise. The diagram of the water tank setup is shown in Fig. 5. Thus, the proposed F-D-LSI algorithm can be applied to such a system.

Fig. 4
figure 4

An experimental setup of a water tank system

Fig. 5
figure 5

The diagram of the water tank setup

6 Example

Consider a Hammerstein nonlinear simulation model as follows:

$$\begin{aligned} y(t)= & {} \frac{B(z)}{A(z)}\bar{u}(t)+\frac{1}{C(z)}v(t),\\ \bar{u}(t)= & {} \alpha _1u^2(t)+\alpha _2u^3(t)=0.80u^2(t)+0.60u^3(t),\\ A(z)= & {} 1+a_1z^{-1}+a_2z^{-2}=1+0.38z^{-1}+0.42z^{-2},\\ B(z)= & {} b_1z^{-1}+b_2z^{-2}=0.75z^{-1}-0.33z^{-2},\\ C(z)= & {} 1+c_1z^{-1}=1+0.85z^{-1},\\ {\varvec{{\theta }}}= & {} \left[ 0.38, 0.42, 0.75, -0.33, 0.80, 0.60, 0.85\right] ^{\mathrm{T}}. \end{aligned}$$

In simulation, the input \(\{u(t)\}\) is taken as a persistent excitation signal sequence with zero mean and unit variance, and \(\{v(t)\}\) as a white noise sequence with zero mean and variance \(\sigma ^2\), the data length \(L=1000\) and \(L=2000\), respectively. Applying the D-LSI algorithm in (15)–(26) and the F-D-LSI algorithm with finite measurement data in (58)–(77) to estimate the parameters of this system, the parameter estimates and their estimation errors \(\delta :=\Vert \hat{{\varvec{{\theta }}}}_k-{\varvec{{\theta }}}\Vert /\Vert {\varvec{{\theta }}}\Vert \) with different data length L are given in Tables 12, the F-D-LSI parameter estimation errors with different noise variances \(\sigma ^2\) are shown in Fig. 6, the parameter estimation errors of the two algorithms are plotted in Fig. 7.

Fig. 6
figure 6

The F-D-LSI estimation errors \(\delta \) versus k (\(L=2000\))

Fig. 7
figure 7

The parameter estimation errors \(\delta \) versus k (\(\sigma ^2=1.50^2\), \(L=1000\))

When the noise variance \(\sigma ^2=1.00^2\), iteration \(k=15\), the D-LSI estimated model is given by

$$\begin{aligned} y(t)= & {} \frac{0.76582z^{-1}-0.31992z^{-2}}{1+0.38788z^{-1}+0.37752 z^{-2}}\bar{u}(t)\\&+\frac{1}{1+0.85828z^{-1}}v(t),\\ \bar{u}(t)= & {} 0.77549u^2(t)+0.63137u^3(t), \end{aligned}$$

the F-D-LSI estimated model is given by

$$\begin{aligned} y(t)= & {} \frac{0.77354z^{-1}-0.31140z^{-2}}{1+0.39827z^{-1}+0.40385z^{-2}}\bar{u}(t)\\&+\frac{1}{1+0.85839z^{-1}}v(t),\\ \bar{u}(t)= & {} 0.79551u^2(t)+0.60594u^3(t). \end{aligned}$$

For model validation, we use a different dataset (\(L_e=1000\) samples from \(t=2001\) to 3000) and the estimated models obtained by the D-LSI algorithm and the F-D-LSI algorithm. The predicted outputs and the true outputs are plotted in Fig. 8 from \(t=2001\) to 2100 and Fig. 9 from \(t=2001\) to 3000. Using the estimated outputs to compute the average output errors:

$$\begin{aligned} \delta _{e1}= & {} \frac{1}{1000}\left[ \sum _{j=2001}^{3000}[y(j)-\hat{y}_1(j)]^2\right] ^{\frac{1}{2}}=0.0550784,\\ \delta _{e2}= & {} \frac{1}{1000}\left[ \sum _{j=2001}^{3000}[y(j)-\hat{y}_2(j)]^2\right] ^{\frac{1}{2}}=0.0548344, \end{aligned}$$

where \(\hat{y}_1(t)\) is the predicted output given by the D-LSI model, \(\hat{y}_2(t)\) is the predicted output given by the F-D-LSI model, and y(t) is the true output.

Fig. 8
figure 8

The true output and predicted output from \(t=2001\) to 2100 (\(\sigma ^2=1.00^2\), \(k=15\))

Fig. 9
figure 9

The true output and predicted output from \(t=2001\) to 3000 (\(\sigma ^2=1.00^2\), \(k=15\))

From Figs. 6789 and Tables 1 and 2, we can draw the following conclusions.

  • The parameter estimation errors are becoming smaller (in general) as k increasing—see Figs. 6 and 7.

  • Under the same data length, the parameter estimation errors become smaller as the noise variances decrease—see Fig. 6.

  • Under the same noise variances and data lengths, the F-D-LSI algorithm can generate more accurate parameter estimates than the F-LSI algorithm—see Tables 1 and 2 and Fig. 7.

  • The F-D-LSI algorithm can generate accurate parameter estimates after only several iterations—see Tables 1 and 2.

  • The predicted outputs are very close to the true outputs, so the estimated model can capture the dynamics system well—see Figs. 8 and 9.

7 Conclusions

This paper presents a least squares iterative algorithm and a filtering-based least squares iterative algorithm for IN-OEAR systems by using the model decomposition technique. Compared with the D-LSI algorithm, the F-D-LSI algorithm has higher estimation accuracy. The simulation test validates the effectiveness of the proposed algorithms. The proposed algorithms can be extended to study the parameter estimation problem for dual-rate sampled systems and non-uniformly sampled systems [3638] and applied to other fields [3942].