1 Introduction

System identification is to find a model that is close to a real system by using measured data [13], and is basic for signal processing, adaptive control, and filtering [46]. Nonlinearities exist widely in various aspects of society [7, 8], such as engineering practice [911], chemical processes [12], and biological systems [13]. The identification of nonlinear systems widely ranges from the structure to the biomedical engineering [14, 15] and employs a number of classic and modern approaches [16]. Hammerstein models, Wiener models, and their combinations are the common block-oriented nonlinear models [1721]. The Hammerstein system consists of a nonlinear static block followed by a linear dynamic subsystem [22]. Recently, Ding and Chen [23] proposed a recursive extended least squares algorithm and a least squares based iterative identification algorithm for Hammerstein ARMAX systems, and a coupled least squares identification methods for multivariate systems [24].

The iterative algorithms are widely used to find the solutions of matrix equations [25, 26] and can be used for parameter estimation [2730]. Recently, Li [31] proposed the maximum likelihood Newton iterative algorithm for Hammerstein CARARMA systems. Ding et al. [32] derived a Newton iterative identification algorithm for Hammerstein nonlinear systems.

The identification model of Hammerstein systems contains the product of the parameters of the nonlinear part and the linear part. Due to this difficulty, Vörös [33, 34] proposed the key variables separation technique for Hammerstein systems with discontinuous nonlinearities containing dead-zones [35]. Wang et al. [36] derived the auxiliary model-based recursive generalized least squares parameter estimation algorithm for Hammerstein OEAR systems. Li and Ding [37] presented a maximum likelihood stochastic gradient algorithm for Hammerstein systems with colored noise based on the key term separation technique.

This paper studies the iterative algorithm for input nonlinear finite impulse response moving average systems and derives a Newton iterative algorithm. By using the key variables separation technique, the parameters of the nonlinear part and the linear part can be directly estimated without using the over-parameterization methods.

The rest of this paper is organized as follows. Section 2 describes the identification model of the Hammerstein finite impulse response moving average systems. Sections 3 and 4 derive the Newton iterative algorithm. Section 5 provides an example to show the effectiveness of the proposed algorithm. Finally, some concluding remarks are offered in Sect. 6.

2 System description

Let us introduce some notation. \(\hat{{\varvec{\vartheta }}}(t)\) denotes the estimate of \({\varvec{\vartheta }}\) at time \(t\); \(\hat{{\varvec{\vartheta }}}_k\) denotes the estimate of \({\varvec{\vartheta }}\) at iteration \(k\); \(\mathbf{1}_n\) represents an \(n\)-dimensional column vector whose elements are 1; the norm of a matrix \({{\varvec{X}}}\) is defined by \(\Vert {{\varvec{X}}}\Vert ^2:=\mathrm{tr}[{{\varvec{X}}}{{\varvec{X}}}^{\tiny \text{ T }}]\); and the superscript T denotes the matrix transpose.

Consider an input nonlinear finite impulse response moving average (IN-FIR-MA) system in Fig. 1 [37], which consists of a nonlinear static block \(f(\cdot )\) followed by a linear finite impulse response moving average (FIR-MA) subsystem, where \(u(t)\) is the input sequence of the

Fig. 1
figure 1

An input nonlinear finite impulse response moving average system

system, \(y(t)\) is the output sequence, \(v(t)\) is a white noise with zero mean, and \(x(t)\) and \(w(t)\) are the inner variables. The output \(\bar{u}(t)\) of the nonlinear block is a linear combination of a known basis \({{\varvec{f}}}(u(t)):=(f_1(u(t)), f_2(u(t)), \ldots , f_{n_c}(u(t)))\) with coefficients \((c_1, c_2, \ldots , c_{n_c})\) and can be written as

$$\begin{aligned} \bar{u}(t)&= f(u(t))=c_1f_1(u(t))+c_2f_2(u(t))\nonumber \\&+\cdots +c_{n_c}f_{n_c}(u(t)) =\sum \limits _{j=1}^{n_c}c_jf_j(u(t)). \end{aligned}$$
(1)

The linear part is an FIR-MA model, and \(B(z)\) and \(D(z)\) are polynomials in the shift operator \(z^{-1}\) with

$$\begin{aligned} B(z)&:= b_0+b_1z^{-1}+b_2z^{-2}+\cdots +b_{n_b}z^{-n_b},\\ D(z)&:= 1+d_1z^{-1}+d_2z^{-2}+\cdots +d_{n_d}z^{-n_d}. \end{aligned}$$

Define parameter vectors

$$\begin{aligned} {{\varvec{b}}}&:= [b_0, b_1, b_2, \ldots , b_{n_b}]^{\tiny \text{ T }}\in {\mathbb {R}}^{n_b+1},\\ {{\varvec{c}}}&:= [b_1, b_2, \ldots , b_{n_c}]^{\tiny \text{ T }}\in {\mathbb {R}}^{n_c},\\ {{\varvec{d}}}&:= [d_1, d_2, \ldots , d_{n_d}]^{\tiny \text{ T }}\in {\mathbb {R}}^{n_d}. \end{aligned}$$

Define the information matrix \({{\varvec{F}}}(t)\) and the noise information vector \(\varvec{\psi }(t)\) as

$$\begin{aligned} {{\varvec{F}}}(t)&:= [{{\varvec{f}}}(u(t)), {{\varvec{f}}}(u(t-1)),\\&\ldots , {{\varvec{f}}}(u(t-n_b))]^{\tiny \text{ T }}\in {\mathbb {R}}^{n_b\times n_c},\\ \varvec{\psi }(t)&:= [v(t-1), v(t-2), \ldots , v(t-n_d)]^{\tiny \text{ T }}\in {\mathbb {R}}^{n_d}. \end{aligned}$$

Then the output \(y(t)\) in Fig. 1 can be expressed as

$$\begin{aligned} y(t)&= x(t)+w(t)\nonumber \\&= B(z)\bar{u}(t)+D(z)v(t)\nonumber \\&= (b_0+b_1z^{-1}+b_2z^{-2}+\cdots +b_{n_b}z^{-n_b})\bar{u}(t)\nonumber \\&+\,(1+d_1z^{-1}+d_2z^{-2}+\cdots +d_{n_d}z^{-n_d})v(t)\nonumber \\&= b_0\bar{u}(t)+b_1\bar{u}(t-1)+b_2\bar{u}(t-2)\nonumber \\&+\cdots +b_{n_b}\bar{u}(t-n_b)+d_1v(t-1)\nonumber \\&+\,d_2v(t-2)+\cdots +d_{n_d}v(t-n_d)+v(t)\end{aligned}$$
(2)
$$\begin{aligned}&= b_0{{\varvec{f}}}(u(t)){{\varvec{c}}}+b_1{{\varvec{f}}}(u(t-1)){{\varvec{c}}}+b_2{{\varvec{f}}}(u(t-2)){{\varvec{c}}}\nonumber \\&+\cdots +b_{n_b}{{\varvec{f}}}(u(t-n_b)){{\varvec{c}}}+\varvec{\psi }^{\tiny \text{ T }}(t){{\varvec{d}}}+v(t)\nonumber \\&= {{\varvec{b}}}^{\tiny \text{ T }}{{\varvec{F}}}(t){{\varvec{c}}}+\varvec{\psi }^{\tiny \text{ T }}(t){{\varvec{d}}}+v(t). \end{aligned}$$
(3)

Equation (3) contains the product of the parameters \({{\varvec{b}}}\) and \({{\varvec{c}}}\), so it is difficult to identify the parameters of the system. In order to avoid the parameter product of the linear and nonlinear blocks, we use the key variables separation technique and let the coefficient \(b_0=1\) [31]. Then Eq. (2) can be rewritten as

$$\begin{aligned} y(t)&= \bar{u}(t)+b_1\bar{u}(t-1)+b_2\bar{u}(t-2)\nonumber \\&+\cdots +b_{n_b}\bar{u}(t-n_b)+\varvec{\psi }^{\tiny \text{ T }}(t){{\varvec{d}}}+v(t). \end{aligned}$$
(4)

Here the first term \(\bar{u}(t)\) on the right-hand side of (4) is chosen as a separated key variable, and the rests are taken as the non-separated key variables. Referring to the key variables separation principle [33, 34], substituting \(\bar{u}(t)\) in (1) into the separated key variable \(\bar{u}(t)\) in (4) and keeping the non-separated key variables unchanged give

$$\begin{aligned} y(t)&= c_1f_1(u(t))+c_2f_2(u(t))+\cdots +c_{n_c}f_{n_c}(u(t))\nonumber \\&+b_1\bar{u}(t-1)+b_2\bar{u}(t-2)\nonumber \\&+\cdots +b_{n_b}\bar{u}(t-n_b)+\varvec{\psi }^{\tiny \text{ T }}(t){{\varvec{d}}}+v(t). \end{aligned}$$
(5)

Using the key variables separation technique, we express the output \(y(t)\) of the system as the linear regressive form of all parameters—see Eq. (5).

Define the parameter vector \({\varvec{\vartheta }}\) and the information vector \({\varvec{\varphi }}(t)\) as

$$\begin{aligned} {\varvec{\vartheta }}&:= [b_1, b_2, \ldots , b_{n_b}, c_1, c_2, \ldots , c_{n_c}, d_1, d_2,\\&\ldots , d_{n_d}]^{\tiny \text{ T }}\in {\mathbb {R}}^{n_b+n_c+n_d},\\ {\varvec{\varphi }}(t)&:= [\bar{u}(t-1), \bar{u}(t-2), \ldots ,\\&\bar{u}(t-n_b), f_1(u(t)), f_2(u(t)), \ldots , f_{n_c}(u(t)),\\&\ v(t-1), v(t-2), \ldots , v(t-n_d)]^{\tiny \text{ T }}\nonumber \\&\in {\mathbb {R}}^{n_b+n_c+n_d}. \end{aligned}$$

Thus, Eq. (5) can be rewritten as

$$\begin{aligned} y(t)={\varvec{\varphi }}^{\tiny \text{ T }}(t){\varvec{\vartheta }}+v(t). \end{aligned}$$
(6)

3 The Newton iterative algorithm

Consider a set of data from \(i=t-L+1\) to \(i=t\) (\(L\) represents the data length). Define the stacked output vector \({{\varvec{Y}}}(t)\) and the stacked information matrices \(\varvec{\varPhi }(t)\) as

$$\begin{aligned} {{\varvec{Y}}}(t)&:= \left[ \begin{array}{c} y(t) \\ y(t-1) \\ \vdots \\ y(t-L+1) \end{array}\right] \in {\mathbb {R}}^L,\nonumber \\ \varvec{\varPhi }(t)&:= \left[ \begin{array}{c} \varvec{\varphi }^\mathrm{T}(t) \\ \varvec{\varphi }^\mathrm{T}(t-1) \\ \vdots \\ \varvec{\varphi }^\mathrm{T}(t-L+1) \end{array}\right] \nonumber \\&\in {\mathbb {R}}^{L\times (n_b+n_c+n_d)}. \end{aligned}$$
(7)

Define the criterion function,

$$\begin{aligned} J(\varvec{\vartheta }):=\Vert {{\varvec{Y}}}(t)-\varvec{\varPhi }(t)\varvec{\vartheta }\Vert ^2. \end{aligned}$$

The gradient of \(J(\varvec{\vartheta })\) with respect to \(\varvec{\vartheta }\) is

$$\begin{aligned} \mathrm{grad}[J(\varvec{\vartheta })]&= -2\varvec{\varPhi }^\mathrm{T}(t)[{{\varvec{Y}}}(t)-\varvec{\varPhi }(t)\varvec{\vartheta }]\nonumber \\&\in {\mathbb {R}}^{(n_b+n_c+n_d)\times (n_b+n_c+n_d)}. \end{aligned}$$

Compute the Hessian matrix of the cost function \(J(\varvec{\vartheta })\) with respect to \(\varvec{\vartheta }\):

$$\begin{aligned} {{\varvec{H}}}(\varvec{\vartheta }):&= \frac{\partial ^2 J(\varvec{\vartheta })}{\partial \varvec{\vartheta }\partial \varvec{\vartheta }^\mathrm{T}} =\frac{\partial \mathrm{grad}[J(\varvec{\vartheta })]}{\partial \varvec{\vartheta }^\mathrm{T}}\\&= 2\varvec{\varPhi }^\mathrm{T}(t)\varvec{\varPhi }(t)\in {\mathbb {R}}^{(n_b+n_c+n_d)\times (n_b+n_c+n_d)}. \end{aligned}$$

Using the Newton method and minimizing \(J(\varvec{\vartheta })\) give

$$\begin{aligned} \hat{\varvec{\vartheta }}_k(t)&= \hat{\varvec{\vartheta }}_{k-1}(t)-[{{\varvec{H}}}(\hat{\varvec{\vartheta }}_{k-1}(t))]^{-1}\mathrm{grad}[J(\hat{\varvec{\vartheta }}_{k-1}(t))]\nonumber \\&= \hat{\varvec{\vartheta }}_{k-1}(t)+[\varvec{\varPhi }^\mathrm{T}(t)\varvec{\varPhi }(t)]^{-1}\varvec{\varPhi }^\mathrm{T}(t)\nonumber \\&\times [{{\varvec{Y}}}(t)-\varvec{\varPhi }(t)\hat{\varvec{\vartheta }}_{k-1}(t)]. \end{aligned}$$
(8)

Because the information matrix \(\varvec{\varphi }(t)\) contains the unknown inner variables \(\bar{u}(t-i)\) (the output of nonlinear block) and \(v(t-i)\), the above algorithm cannot be applied to estimate \(\varvec{\vartheta }\). The solution is to use the auxiliary model identification idea [38]: the unknown variables are replaced with the outputs of the auxiliary model, the unknown variables \(\bar{u}(t-i)\) are replaced with their corresponding estimate \(\hat{\bar{u}}_{k-1}(t-i)\) at iteration \(k-1\), and \(v(t-i)\) are replaced with their estimates \(\hat{v}(t-i)\). Define

$$\begin{aligned} \hat{{\varvec{\varphi }}}_k(t)&:= [\hat{\bar{u}}_{k-1}(t-1), \hat{\bar{u}}_{k-1}(t-2), \\&\ldots , \hat{\bar{u}}_{k-1}(t-n_b), {{\varvec{f}}}(u(t)), \hat{v}_{k-1}(t-1),\\&\hat{v}_{k-1}(t-2), \ldots , \hat{v}_{k-1}(t-n_d)]^\mathrm{T}\nonumber \\&\in {\mathbb {R}}^{n_b+n_c+n_d}. \end{aligned}$$

Let the iterative estimate \(\varvec{\hat{\vartheta }}_k(t)\) of \(\varvec{\vartheta }\) at iteration \(k\) be

$$\begin{aligned} \varvec{\hat{\vartheta }}_k(t)&:= [\hat{b}_{1,k}(t), \hat{b}_{2,k}(t), \ldots , \hat{b}_{n_b,k}(t), \hat{{{\varvec{c}}}}^\mathrm{T}_k(t), \hat{{{\varvec{d}}}}^\mathrm{T}_k(t)]^\mathrm{T}\nonumber \\&\in {\mathbb {R}}^{n_b+n_c+n_d},\\ \hat{{{\varvec{c}}}}_k(t)&:= [\hat{c}_{1,k}(t), \hat{c}_{2,k}(t), \ldots , \hat{c}_{n_c,k}(t)]^\mathrm{T}\in {\mathbb {R}}^{n_c},\\ \hat{{{\varvec{d}}}}_k(t)&:= [\hat{d}_{1,k}(t), \hat{d}_{2,k}(t), \ldots , \hat{d}_{n_d,k}(t)]^\mathrm{T}\in {\mathbb {R}}^{n_d}. \end{aligned}$$

Substituting \(c_j\) in (1) with \(\hat{c}_{j,k}(t)\), the iterative estimate \(\hat{\bar{u}}_k(t-i)\) of \(\bar{u}(t-i)\) at iteration \(k\) can be compute through

$$\begin{aligned} \hat{\bar{u}}_k(t-i)&= \hat{c}_{1,k}(t)f_1(u(t-i))+\hat{c}_{2,k}(t)f_2(u(t-i))\\&+\cdots +\hat{c}_{m,k}(t)f_m(u(t-i))\\&= \sum \limits _{j=1}^{m}\hat{c}_{j,k}(t)f_j(u(t-i))\\&= {{\varvec{f}}}(u(t-i))\hat{{{\varvec{c}}}}_k(t). \end{aligned}$$

The estimates \(\hat{v}_k(t-i)\) of \({v}(t-i)\) can be compute through

$$\begin{aligned} \hat{v}_k(t-i)=y(t-i)-\hat{\varvec{\varphi }}_k(t-i)\hat{\varvec{\vartheta }}_k(t). \end{aligned}$$

Replace \(\varvec{\varphi }(t)\) in \(\varvec{\varPhi }(t)\) with \(\hat{\varvec{\varphi }}_k(t)\), and define

$$\begin{aligned} \hat{\varvec{\varPhi }}_k(t)&:= [\varvec{\hat{\varphi }}_k(t), \varvec{\hat{\varphi }}_k(t-1), \ldots , \varvec{\hat{\varphi }}_k(t-L+1)]^\mathrm{T}\\&\in {\mathbb {R}}^{L\times (n_b+n_c+n_d)}. \end{aligned}$$

Substituting \(\varvec{\varPhi }(t)\) in (8) with \(\hat{\varvec{\varPhi }}_k(t)\), we can obtain the Newton iterative algorithm for input nonlinear finite impulse response systems:

$$\begin{aligned} \hat{\varvec{\vartheta }}_k(t)&= \hat{\varvec{\vartheta }}_{k-1}(t)+[\hat{\varvec{\varPhi }}_k^\mathrm{T}(t)\hat{\varvec{\varPhi }}_k(t)]^{-1}\hat{\varvec{\varPhi }}_k^\mathrm{T}(t)\nonumber \\&\times [{{\varvec{Y}}}(t)-\hat{\varvec{\varPhi }}_k(t)\hat{\varvec{\vartheta }}_{k-1}(t)]\end{aligned}$$
(9)
$$\begin{aligned}&= [\hat{\varvec{\varPhi }}_k^\mathrm{T}(t)\hat{\varvec{\varPhi }}_k(t)]^{-1}\hat{\varvec{\varPhi }}_k^\mathrm{T}(t){{\varvec{Y}}}(t),\ k=1, 2, 3, \ldots \end{aligned}$$
(10)
$$\begin{aligned} {{\varvec{Y}}}(t)&= [y(t), y(t-1), \ldots , y(t-L+1)]^\mathrm{T}, \end{aligned}$$
(11)
$$\begin{aligned} \hat{\varvec{\varPhi }}_k(t)&= [\hat{\varvec{\varphi }}_k(t), \hat{\varvec{\varphi }}_k(t-1), \ldots , \hat{\varvec{\varphi }}_k(t-L+1)]^\mathrm{T}, \end{aligned}$$
(12)
$$\begin{aligned} \hat{\varvec{\varphi }}_k(j)&= [\hat{\bar{u}}_{k-1}(j-1), \hat{\bar{u}}_{k-1}(j-2),\nonumber \\&\ldots , \hat{\bar{u}}_{k-1}(j-n_b), {{\varvec{f}}}(u(j)), \hat{v}_{k-1}(j-1),\nonumber \\&\hat{v}_{k-1}(j-2), \ldots , \hat{v}_{k-1}(j-n_d)]^\mathrm{T},\nonumber \\&j=t-L+1, t-L+2, \ldots , t, \end{aligned}$$
(13)
$$\begin{aligned} \hat{\bar{u}}_k(j-i)&= {{\varvec{f}}}(u(j-i))\hat{{{\varvec{c}}}}_k(t),\ i=1, 2, \ldots , n_b, \end{aligned}$$
(14)
$$\begin{aligned} \hat{v}_k(j\,{-}\,i)&= y(j-i)\,{-}\,\hat{\varvec{\varphi }}_k(j-i)\hat{\varvec{\vartheta }}_k(t), \ i\,{=}\,1, 2, \ldots , n_d, \end{aligned}$$
(15)
$$\begin{aligned} {{\varvec{f}}}(u(j))&= [f_1(u(j)), f_2(u(j)), \ldots , f_m(u(j))], \end{aligned}$$
(16)
$$\begin{aligned} \varvec{\hat{\vartheta }}_k(t)&= [\hat{b}_{1,k}(t), \hat{b}_{2,k}(t), \ldots , \hat{b}_{n_b,k}(t), \hat{{{\varvec{c}}}}^\mathrm{T}_k(t),\nonumber \\&\hat{d}_{1,k}(t), \hat{d}_{2,k}(t), \ldots , \hat{d}_{n_d,k}(t)]^\mathrm{T}, \end{aligned}$$
(17)
$$\begin{aligned} \hat{{{\varvec{c}}}}_k(t)&= [\hat{c}_{1,k}(t), \hat{c}_{2,k}(t), \ldots , \hat{c}_{n_c,k}(t)]^\mathrm{T}. \end{aligned}$$
(18)

The procedure for computing the parameter estimation vector \(\hat{\varvec{\vartheta }}_k(t)\) in the Newton iterative algorithm in (9)–(18) is as follows:

  1. 1.

    Set the data length \(L\), let \(t=L\). Collect the input–output data {\(u(i)\), \(y(i)\): \(i=0, 1, 2, \ldots , L-1\)}, and pre-set a small \(\varepsilon >0\).

  2. 2.

    Collect the input–output data \(u(t)\) and \(y(t)\) and form \({{\varvec{Y}}}(t)\) by using (11).

  3. 3.

    Let \(k=1\), and set the initial values \(\hat{\varvec{\vartheta }}_0(t)=\mathbf{1}_{n_b+n_c+n_d}/p_0\), \(p_0=10^6\).

  4. 4.

    Form \(\hat{\varvec{\varphi }}_k(j)\) by using (13), and construct \(\hat{\varvec{\varPhi }}_k(t)\) by using (12).

  5. 5.

    Update the parameter estimate \(\hat{\varvec{\vartheta }}_k(t)\) by using (9).

  6. 6.

    Compute \(\hat{\bar{u}}_k(j-i)\) by using (14), and \(\hat{v}_k(j-i)\) by using (15).

  7. 7.

    If \(\Vert \hat{\varvec{\vartheta }}_k(t)-\hat{\varvec{\vartheta }}_{k-1}(t)\Vert >\varepsilon \), increase \(k\) by 1 and go to step 4; otherwise, obtain \(k\) and \(\hat{\varvec{\theta }}_k(t)\), let \(\hat{\varvec{\vartheta }}_0(t+1)=\hat{\varvec{\theta }}_k(t)\), and increase \(t\) by 1 and go to step 2.

The flowchart of computing the parameter estimation vector \(\varvec{\hat{\vartheta }}_k(t)\) is shown in Fig. 2.

Fig. 2
figure 2

The flowchart of computing the Newton iteration parameter estimate \(\hat{\varvec{\vartheta }}_k(t)\)

4 The Newton iterative algorithm with finite measurement data

Let \(L\) represents the data length, set \(p=L\) and \(t=L\) in (7). Then we have

$$\begin{aligned} {{\varvec{Y}}}&:= \left[ \begin{array}{c} y(L) \\ y(L-1) \\ \vdots \\ y(1) \end{array}\right] \in {\mathbb {R}}^L,\nonumber \\ \varvec{\varPhi }&:= \left[ \begin{array}{c} \varvec{\varphi }^\mathrm{T}(L) \\ \varvec{\varphi }^\mathrm{T}(L-1) \\ \vdots \\ \varvec{\varphi }^\mathrm{T}(1) \end{array}\right] \in {\mathbb {R}}^{L\times (n_b+n_c+n_d)}. \end{aligned}$$

\({{\varvec{Y}}}\) and \(\varvec{\varPhi }\) contain all of the input–output data {\(u(t), y(t): t=1, 2, \ldots , L\)}. The criterion function is defined as

$$\begin{aligned} J_1(\varvec{\vartheta }):=\Vert {{\varvec{Y}}}-\varvec{\varPhi }\varvec{\vartheta }\Vert ^2. \end{aligned}$$

The unknown variables \(\bar{u}(t-i)\) in the information matrix \(\varvec{\varPhi }\) are replaced with their corresponding estimate \(\hat{\bar{u}}_{k-1}(t-i)\) at iteration \(k-1\), and \(v(t-i)\) are replaced with their estimates \(\hat{v}(t-i)\). Similarly, we can obtain the Newton iterative algorithm for the input nonlinear finite impulse response system with finite measurement data:

$$\begin{aligned} \hat{\varvec{\vartheta }}_k&= \hat{\varvec{\vartheta }}_{k-1}+(\hat{\varvec{\varPhi }}_k^\mathrm{T}\hat{\varvec{\varPhi }}_k)^{-1}\hat{\varvec{\varPhi }}_k^\mathrm{T}({{\varvec{Y}}}-\hat{\varvec{\varPhi }}_k\hat{\varvec{\vartheta }}_{k-1})\end{aligned}$$
(19)
$$\begin{aligned}&= (\hat{\varvec{\varPhi }}_k^\mathrm{T}\hat{\varvec{\varPhi }}_k)^{-1}\hat{\varvec{\varPhi }}_k^\mathrm{T}{{\varvec{Y}}},\ k=1, 2, 3, \ldots \end{aligned}$$
(20)
$$\begin{aligned} {{\varvec{Y}}}&= [y(L), y(L-1), \ldots , y(1)]^\mathrm{T}, \end{aligned}$$
(21)
$$\begin{aligned} \hat{\varvec{\varPhi }}_k&= [\hat{\varvec{\varphi }}_k(L), \hat{\varvec{\varphi }}_k(L-1), \ldots , \hat{\varvec{\varphi }}_k(1)]^\mathrm{T}, \end{aligned}$$
(22)
$$\begin{aligned} \hat{\varvec{\varphi }}_k(t)&= [\hat{\bar{u}}_{k-1}(t-1), \hat{\bar{u}}_{k-1}(t-2),\nonumber \\&\ldots , \hat{\bar{u}}_{k-1}(t-n_b), {{\varvec{f}}}(u(t)), \hat{v}_{k-1}(t-1),\nonumber \\&\hat{v}_{k-1}(t-2),\nonumber \\&\ldots , \hat{v}_{k-1}(t-n_d)]^\mathrm{T}, t=1, 2, \ldots , L, \end{aligned}$$
(23)
$$\begin{aligned} \hat{\bar{u}}_k(t)&= {{\varvec{f}}}(u(t))\hat{{{\varvec{c}}}}_k,\end{aligned}$$
(24)
$$\begin{aligned} \hat{v}_k(t)&= y(t)-\hat{\varvec{\varphi }}_k(t)\hat{\varvec{\vartheta }}_k,\end{aligned}$$
(25)
$$\begin{aligned} {{\varvec{f}}}(u(t))&= [f_1(u(t)), f_2(u(t)), \ldots , f_m(u(t))], \end{aligned}$$
(26)
$$\begin{aligned} \varvec{\hat{\vartheta }}_k&= [\hat{b}_{1,k}, \hat{b}_{2,k}, \ldots , \hat{b}_{n_b,k}, \hat{{{\varvec{c}}}}^\mathrm{T}_k, \hat{d}_{1,k}, \hat{d}_{2,k}, \ldots , \hat{d}_{n_d,k}]^\mathrm{T}, \end{aligned}$$
(27)
$$\begin{aligned} \hat{{{\varvec{c}}}}_k&= [\hat{c}_{1,k}, \hat{c}_{2,k}, \ldots , \hat{c}_{n_c,k}]^\mathrm{T}. \end{aligned}$$
(28)

The procedure for computing the parameter estimation vector \(\varvec{\hat{\vartheta }}_k\) in (19)–(28) is as follows:

  1. 1.

    Set the data length \(L\), and pre-set a small \(\varepsilon >0\). Collect the input–output data {\(u(t)\), \(y(t)\): \(t=1, 2, \cdots , L\)}, form \({{\varvec{Y}}}\) by using (21).

  2. 2.

    Let \(k=1\), and set the initial values \(\hat{\varvec{\vartheta }}_0=\mathbf{1}_{n_b+n_c+n_d}/p_0\), \(p_0=10^6\).

  3. 3.

    Form \(\hat{\varvec{\varphi }}_k(t)\) by using (23), and construct \(\hat{\varvec{\varPhi }}_k\) by using (22).

  4. 4.

    Update the parameter estimate \(\hat{\varvec{\vartheta }}_k\) by using (19).

  5. 5.

    Compute \(\hat{\bar{u}}_k(t)\) by using (24), and \(\hat{v}_k(t)\) by using (25).

  6. 6.

    If \(\Vert \hat{\varvec{\vartheta }}_k-\hat{\varvec{\vartheta }}_{k-1}\Vert >\varepsilon \), increase \(k\) by 1 and go to step 3; otherwise, obtain \(k\) and \(\hat{\varvec{\vartheta }}_k\).

The flowchart of computing the parameter estimation vector \(\hat{\varvec{\vartheta }}_k\) is shown in Fig. 3.

Fig. 3
figure 3

The flowchart of computing the parameter estimate \(\varvec{\hat{\vartheta }}_k\)

By using a batch of input–output data to update the parameter estimates, the Newton iterative algorithm has faster convergence rates compared with the gradient-based iterative algorithm in [27], although the Newton iterative algorithms require computing the Hessian matrix and the matrix inversion. If the input–output data are sufficiently rich, then this matrix inversion exists.

5 Example

Consider the following nonlinear system:

$$\begin{aligned} y(t)&= B(z)\bar{u}(t)+D(z)v(t),\\ B(z)&= b_0\!+\!b_1z^{-1}\!+\!b_2z^{-2}=1\!+\!0.85z^{-1}\!+\!0.65z^{-2},\\ D(z)&= 1+d_1z^{-1}=1+0.40z^{-1},\\ \bar{u}(t)&= c_1u(t)+c_2u^2(t)=0.80u(t)+0.50u^2(t),\\ \varvec{\vartheta }&= [b_1, b_2, c_1, c_2, d_1]^\mathrm{T} \!=\! [0.85, 0.65, 0.80, 0.50, 0.40]^\mathrm{T}. \end{aligned}$$

In simulation, the input \(u(t)\) is taken as an uncorrelated stochastic signal sequence with zero mean and unit variance, and \(v(t)\) as a white noise sequence with zero mean and variances \(\sigma ^2=0.10^2\) and \(\sigma ^2= 0.50^2\). Taking the date length \(L=2000\), and applying the proposed Newton iterative algorithm in (19)–(28) to estimate the parameters of this example system, the parameter estimates and their errors \(\delta :=\Vert \hat{\varvec{\vartheta }}_k-\varvec{\vartheta }\Vert /\Vert \varvec{\vartheta }\Vert \) are shown in Tables 1, 2 and Fig. 4.

Table 1 The parameter estimates and errors (\(\sigma ^2=0.10^2\), \(L\) = 2000)
Table 2 The parameter estimates and errors (\(\sigma ^2=0.50^2\), \(L\) = 2000)

From Tables 1, 2, and Fig. 4, we can draw the following conclusions. (1) The estimation errors are small for iteration \(k\ge 5\), the parameter estimates oscillate, because there exist disturbance noises. (2) The parameter estimates are very close to their true values for large \(k\). (3) A lower noise level results in a smaller parameter estimation error.

Fig. 4
figure 4

The parameter estimation errors versus \(k\) with different variances

Furthermore, using the Monte Carlo simulations with 20 sets of noise realizations, the parameter estimates and the estimation variances of the Newton iterative algorithms are shown in Tables 3 and 4 with \(\sigma ^2=0.10^2\), \(\sigma ^2=0.50^2\), and \(L=2000\). From Tables 3 and 4, we can see that the average values of the parameter estimates are very close to the true parameters, and the variances are small for iteration \(k>5\). This validates the performance of the proposed Newton iterative algorithm.

Table 3 The parameter estimates and variances based on the 20 Monte Carlo runs (\(\sigma ^2=0.10^2\))
Table 4 The parameter estimates and variances based on the 20 Monte Carlo runs (\(\sigma ^2=0.50^2\))

6 Conclusions

This paper proposes a Newton iterative algorithm for the input nonlinear finite impulse response moving average system. The output of the system can be expressed as the linear regressive form of all parameters by using the key variables separation technique. The simulation results indicate that the proposed algorithms have fast convergence rates and accurate estimates compared with the gradient- based iterative algorithm. The proposed algorithm can be combined with other identification methods [39, 40] to study identification problems of other linear or nonlinear systems with colored noises.