1 Introduction

Most real-life systems can be modeled quite well with a linear model, but in practice, almost all processes are nonlinear if they are considered not merely in a small vicinity of their working points, so better results can be obtained by using a nonlinear model. As is known, nonlinear models are commonly used to describe the behavior of many industrial processes because of the increased accuracy and performances of the system behavior, which has drawn great attention of many researchers. And the nonlinear time series analysis is usually a very useful and widely used tool to study the observed nonlinear behavior [15]. Among the nonlinear models, the so-called block-oriented models such as the Hammerstein model, Wiener model and Hammerstein–Wiener model have turned out to be very useful for the estimation of nonlinear systems. A great deal of work on the identification of these models has been proposed [620]. The Hammerstein model, which consists of a static nonlinear block followed by a linear time-invariant subsystem, is the focus of this paper.

The identification of Hammerstein systems has been extensively studied in the recent years, so there exist a large number of studies on this topic in the literature, which can be roughly classified into several categories, namely the over-parameterization method [2124], the nonparametric method [2528], the iterative method [2932], the gradient algorithm [3335], the kernel machine and space projection method [3638] and the recursive algorithm [23, 3942]. Recently, Ding et al. [43] put forward decomposition-based Newton iterative method for the identification of a Hammerstein nonlinear FIR system with ARMA noise. Li [44] developed an algorithm based on Newton iteration for the identification of Hammerstein model. Hong and Mitchell [45] used the Bezier–Bernstein approximation of the nonlinear static function to identify the model. Han and Raymond [46] extended the rank minimization approach to Hammerstein system identification by using reconstructing the models. And Zhang et al. [47] proposed a hierarchical gradient-based iterative parameter estimation algorithm for multivariable output error moving average systems. Chen and Chen [41] designed a weighted least squares (WLSs)-based adaptive tracker for a class of Hammerstein systems. Vanbeylen et al. [48] showed a method about the identification of discrete-time Hammerstein systems from output measurements only. Ding et al. [35] proposed a modified stochastic gradient-based parameter estimation algorithm for dual-rate sampled-data systems. Sun and Liu [49] proposed a novel APSO-aided maximum-likelihood identification method for Hammerstein systems. Furthermore, some new nonparametric identification methods are presented, such as the using of neural network to model the static nonlinear part [50, 51] and identification without explicit parameterization of nonlinearity driven by piecewise constant inputs [52].

Ding and Chen [23] presented an excellent recursive least squares (RLSs) method to deal with the Hammerstein models, which has good identification accuracy and convergence rate. And many researchers take Ding’s method and models as a basis to compare with their methods [23, 49, 53]. However, Ding’s [23] RLS approach is not applicable, when the noise model is unknown. And better identification accuracy and faster convergence rate still remain the targets both in theory and applications.

In this paper, the nonlinear recursive instrumental variables (RIVs) method is proposed to the identification of a classic Hammerstein model. Better identification results can be obtained in comparison with RLS method, and the method can be applied when the noise structure is unknown, which has shown that the proposed method is more flexible than RLS method. Furthermore, the property of the mean square convergence of the nonlinear RIV method is rigorously proved.

This paper is organized as follows: Sect. 2 describes the problem formulation based on Hammerstein models. Section 3 depicts the instrumental variable method for the Hammerstein models and three approaches to choose instrumental variables in linear RIV method. Section 4 presents the derivation of the nonlinear RIV method and proves its property of the mean square convergence, respectively. Section 5 illustrates the proposed approach with a classic model based on Hammerstein systems, where the nonlinear RIV method and the ARMAX-RLS method reported in the open literature [23] are compared in detail to show the effectiveness of the proposed method, and finally some conclusions from the above analysis are achieved in Sect. 6.

2 Problem statement

In a deterministic setting, the linear part of the system is characterized by a rational transfer function, and the system output \(y(t)\) is exactly observed. However, in practice, the system itself may be random and the observations may be corrupted by noise. So, it is of practical importance to consider stochastic Hammerstein systems as shown in Fig. 1, which is composed of a nonlinear memoryless block \(f(\cdot )\) followed by a linear subsystem. \(u(t)\) is the system input, \(y(t)\) is the system output and \(v(t)\) is a white noise sequence, respectively. The true output \(x(t)\), colored noise \(e(t)\) and the inner variable \(\overline{u} (t)\), which is the output of the nonlinear block, are unmeasurable. \(N(z)\) is the transfer function of the noise model, and \(G(z)\) is the transfer function of the linear part in the model [49].

Fig. 1
figure 1

The discrete-time SISO Hammerstein system

The linear dynamical block in Fig. 1 is an ARMAX subsystem, the nonlinear model in Fig. 1 has the following input–output relationship [23, 33]:

$$\begin{aligned} y(t)&= x(t)+e(t)\end{aligned}$$
(1)
$$\begin{aligned} x(t)&= G(z)\bar{{u}}(t)=\frac{B(z)}{A(z)}\bar{{u}}(t)\end{aligned}$$
(2)
$$\begin{aligned} e(t)&= N(z)v(t)=\frac{D(z)}{A(z)}v(t)\end{aligned}$$
(3)
$$\begin{aligned} A(z)&= 1+a_1 z^{-1}+a_2 z^{-2}+\cdots +a_{n_a } z^{-n_a }\end{aligned}$$
(4)
$$\begin{aligned} B(z)&= b_1 z^{-1}+b_2 z^{-2}+\cdots +b_{n_b } z^{-n_b }\end{aligned}$$
(5)
$$\begin{aligned} D(z)&= 1+d_1 z^{-1}+d_2 z^{-2}+\cdots +d_{n_d } z^{-n_d } \end{aligned}$$
(6)

This can be transformed as

$$\begin{aligned} y(t)&= -\sum \limits _{i=1}^{n_a } a_i y(t-i)+\sum \limits _{i=1}^{n_b }b_i \bar{u}(t-i)\nonumber \\&\quad +\,\sum \limits _{i=1}^{n_d}d_i v(t-i)+v(t) \end{aligned}$$
(7)

where \(v(t)\) is a white noise sequence with the normal distribution \(v(t)\sim N(0,\sigma _v^2 )\) for the nonparametric \(f(\cdot )\), the value \(f(u)\) is estimated for any fixed \(u\). In the parametric case, \(f(\cdot )\) either is expressed by a linear combination of known basis functions with unknown coefficients, or is a piecewise linear function with unknown joints and slopes, and hence, identification of the nonlinear block in this case is equivalent to estimating unknown parameters. The nonlinear part is considered as a known basis \((\omega _1 ,\omega _2 ,\ldots ,\omega _{n_c })\) with coefficients \((c_1 ,c_2 ,\ldots ,c_{n_c } )\) in this paper:

$$\begin{aligned} f( {u( t)})&= \bar{u}(t)=c_1\omega _1(u(t))+c_2\omega _2(u(t))\nonumber \\&\quad +\,\ldots +c_{n_c}\omega _{n_c}(u(t))=\sum _{i=1}^{n_c}c_i\omega _{i}(u(t)) \end{aligned}$$
(8)

Notice that the parameterization is actually not unique. In order to get a unique parameter estimate, without loss of generality, one of the gains of \(f(\cdot )\) must be fixed. Here, the first coefficient of the nonlinear function is assumed to equal 1, i. e., \(c_1 =1\) [12, 54].

Substitute Eq. (8) into Eq. (7) gives

$$\begin{aligned} y(t)&= -\sum _{i=1}^{n_a } {a_i y(t-i)} +\sum _{j=1}^{n_b } {b_j } \sum _{i=1}^{n_c } {c_i \omega _i (u(t-j))} \nonumber \\&\quad +\,\sum _{i=1}^{n_d } {d_i v(t-i)} +v(t) \end{aligned}$$
(9)

This paper is aimed at presenting the nonlinear RIV method to get the estimation of the unknown parameters \(a_i (i=1,2,\ldots ,n_a ), b_j (j=1,2,\ldots ,n_b )\) and \(c_k (k=1,2,\ldots ,n_c )\) of the nonlinear ARMAX model by using the input and output data \(\{u(t)\},\{y(t)\}\) and to find the property of the algorithm.

3 Instrumental variables

Define the parameter vector, referring to [23, 33], as

$$\begin{aligned}&\varvec{\uptheta } =\left[ {{\begin{array}{c} {\varvec{a}} \\ {c_1 {\varvec{b}}} \\ {c_2 {\varvec{b}}} \\ {\cdots } \\ {c_{n_c} {\varvec{b}}} \\ \end{array} }} \right] \in R^{n_0 },\quad \mathbf{a}=\left[ {{\begin{array}{l} {a_1 } \\ {a_2 } \\ {\cdots } \\ {a_{n_a } } \\ \end{array} }} \right] \in R^{n_a },\nonumber \\&\mathbf{b}=\left[ {{\begin{array}{c} {b_1 } \\ {b_2 } \\ {\cdots } \\ {b_{n_b } } \\ \end{array} }} \right] \in R^{n_b }, \quad e(t)=\sum _{i=1}^{n_d } {d_i v(t-i)} +v(t)\nonumber \\ \end{aligned}$$
(10)

Equation (9) can be written as

$$\begin{aligned} y(t)=\mathbf{h}^{T}(t)\varvec{\uptheta }+e(t) \end{aligned}$$
(11)

or

$$\begin{aligned} \mathbf Y _L =\mathbf H _L \varvec{\uptheta } +\mathbf e _L \end{aligned}$$
(12)

where

$$\begin{aligned}&\mathbf{h}(t)=\left[ {{\begin{array}{c} {\varvec{\uppsi }_0 (t)} \\ {\varvec{\uppsi }_1 (t)} \\ {\varvec{\uppsi }_2 (t)} \\ {\cdots } \\ {\varvec{\uppsi }_{n_c } (t)} \\ \end{array} }} \right] \in R^{n_0 },\nonumber \\&\varvec{\uppsi }_0 (t)=\left[ {{\begin{array}{c} {-y(t-1)} \\ {-y(t-2)} \\ {\cdots } \\ {-y(t-n_a )} \\ \end{array} }} \right] \in R^{n_a },\nonumber \\&\mathbf{e}_L =\left[ {\begin{array}{c} e^{T}(1) \\ e^{T}(2) \\ {\cdots } \\ e^{T}(L) \\ \end{array}} \right] ,\quad \mathbf{Y}_L =\left[ {\begin{array}{c} y(1) \\ y(2) \\ {\cdots } \\ y(L) \\ \end{array}} \right] , \nonumber \\&\mathbf{H}_L =\left[ {\begin{array}{c} \mathbf{h}^{T}(1) \\ \mathbf{h}^{T}(2) \\ \cdots \\ \mathbf{h}^{T}(L) \\ \end{array}} \right] ,\\&\varvec{\uppsi }_j (t)=\left[ {{\begin{array}{c} {\omega _j (u(t-1))} \\ {\omega _j (u(t-2))} \\ {\cdots } \\ {\omega _j (u(t-n_b ))} \\ \end{array} }} \right] \in R^{n_b },\nonumber \\&\qquad j=1,2,\ldots ,n_c\nonumber \end{aligned}$$
(13)

The least squares estimation formula is as follows:

$$\begin{aligned} {\hat{\varvec{\uptheta }}}_\mathrm{LS} \!=\!(\mathbf{H}_L^T \mathbf{H}_L )^{-1}\mathbf{H}_L \mathbf Y _L =\varvec{\uptheta }+\left( \frac{1}{L}\mathbf{H}_L^T \mathbf{H}_L \right) ^{-1}\left( \frac{1}{L}\mathbf{H}_L^T \mathbf{e}_L \right) \end{aligned}$$
(14)

where

$$\begin{aligned} \left\{ {{\begin{array}{l} {\frac{1}{L}\mathbf{H}_L^T \mathbf{H}_L \!=\!\frac{1}{L}\sum \nolimits _{t=1}^L {\mathbf{h}(t)\mathbf{h}^{T}(t)} \mathop {\longrightarrow }\limits ^{W.P.1}_{L\rightarrow \infty } E\left\{ \mathbf{h}(t)\mathbf{h}^{T}(t)\right\} } \\ {\frac{1}{L}\mathbf{H}_L \mathbf{e}_L =\frac{1}{L}\sum \nolimits _{t=1}^L {\mathbf{h}(t)e(t)} \mathop {\longrightarrow }\limits ^{W.P.1}_{L\rightarrow \infty } E\left\{ \mathbf{h}(t)e(t)\right\} } \\ \end{array} }} \right. \end{aligned}$$
(15)

The Frechet Theorem [55] is introduced here.

Theorem 1

Assume that \(\{x(t)\}\) is a sequence of random variables which converges to a constant \(x_0 \), then, there are

$$\begin{aligned} f(x(t))\mathop {\longrightarrow }\limits ^{W.P.1}_{t\rightarrow \infty } f(x_0 ) \end{aligned}$$
(16)

or

$$\begin{aligned} P\mathop {\lim }\limits _{t\rightarrow \infty } f(x(t))=f(x_0 ) \end{aligned}$$
(17)

where \(f(\cdot )\) is continuous scalar function.

Theorem 2

Assume that matrices \({\varvec{A}}_t \) and \({\varvec{B}}_t\) exist the probability limit, and the dimension of them does not change with the increase of \(t\), apply Theorem 1, it gives

$$\begin{aligned} \left\{ {\begin{array}{l} P\mathop {\lim }\limits _{t\rightarrow \infty } \left( {\varvec{A}}_t {\varvec{B}}_t \right) =\left( P\mathop {\lim }\limits _{t\rightarrow \infty } {\varvec{A}}_t \right) \left( P\mathop {\lim }\limits _{t\rightarrow \infty } {\varvec{B}}_t \right) \\ P\mathop {\lim }\limits _{t\rightarrow \infty } \left( {\varvec{A}}_t^{-1} \right) =\left( P\mathop {\lim }\limits _{t\rightarrow \infty } {\varvec{A}}_t \right) ^{-1} \\ \end{array}} \right. \end{aligned}$$
(18)

Apply the two convergence theorems above, there are

$$\begin{aligned} {\hat{\varvec{\uptheta }}}_{LS} \mathop {\longrightarrow }\limits ^{W.P.1}_{L\rightarrow \infty } \varvec{\uptheta }_0 +\left[ {E\{\mathbf{h}(t)\mathbf{h}(t)\}} \right] ^{-1}E\{\mathbf{h}(t)e(t)\} \end{aligned}$$
(19)

If \(e(t)\) is white noise, \(E\{\mathbf{h}(t)e(t)\}=\mathbf{0}\), thus \({\hat{\varvec{\uptheta }}}_{LS} \mathop {\longrightarrow }\limits ^{W.P.1}_{L\rightarrow \infty } \varvec{\uptheta }\),then the unbiased estimation can be obtained.

If \(e(t)\) is not white noise, \(E\{\mathbf{h}(t)e(t)\}\ne \mathbf{0}\). In order to get the unbiased estimation of the parameters, or to say \({\hat{\varvec{\uptheta }}}_{LS} \mathop {\longrightarrow }\limits ^{W.P.1}_{L\rightarrow \infty } \varvec{\uptheta }\) holds, an instrumental matrix is defined as

$$\begin{aligned} \mathbf{H}_L^*=\left[ {{\begin{array}{c} {\mathbf{h}^{*T}(1)} \\ {\mathbf{h}^{*T}(2)} \\ {\cdots } \\ {\mathbf{h}^{*T}(L)} \\ \end{array} }} \right] \end{aligned}$$
(20)

Two conditions are given as follows:

  1. (a)

    \(\frac{1}{L}\mathbf{H}_L^{*T} \mathbf{H}_L \mathop {\longrightarrow }\limits \limits ^{W.P.1}_{L\rightarrow \infty } E\{\mathbf{h}^{*}(t)\mathbf{h}^{T}(t)\}\) is a nonsingular matrix;

  2. (b)

    \(\mathbf{h}^{*}(t)\) is independent of \(e(t)\), which means \(\frac{1}{L}\mathbf{H}_L^{*T} \mathbf{e}_L \mathop {\longrightarrow }\limits ^{W.P.1}_{L\rightarrow \infty }\) \(E\{\mathbf{h}^{*}(t)e(t)\}=\mathbf{0}\), where \(\mathbf{h}^{*}(t)\) are the instrumental variables.

If the instrumental variables meet the two conditions above, it gives

$$\begin{aligned}&{\hat{\varvec{\uptheta }}}_\mathrm{IV} =(\mathbf{H}_L^{*T} \mathbf{H}_L )^{-1}\mathbf{H}_L^{*T} \mathbf{Y}_L \nonumber \\&\qquad =\varvec{\uptheta }+\left( \frac{1}{L}\mathbf{H}_L^{*T} \mathbf{H}_L \right) ^{-1}\left( \frac{1}{L}\mathbf{H}_L^{*T} \mathbf{e}_L \right) \nonumber \\&\qquad \mathop {\longrightarrow }\limits ^{W.P.1}_{L\rightarrow \infty }\varvec{\uptheta }+(E\{\mathbf{h}^{*}(t)\mathbf{h}^{T}(t)\})^{-1}E\{\mathbf{h}^{*}(t)e(t)\}=\varvec{\uptheta }\nonumber \\ \end{aligned}$$
(21)

where \({\hat{\varvec{\uptheta }}}_\mathrm{IV} \) is the parameter estimation with instrumental variables method.

It can be seen that if the chosen instrumental variables are suitable to satisfy the two conditions above, then unbiased and consistent parameter estimation can be obtained. So, how to choose suitable instrumental variables is crucial to instrumental variables method, the following three approaches, usually used in the linear RIV method [56], are introduced first to choose instrumental variables.

Fig. 2
figure 2

The choice of the instrumental variables

\(m(t)\) is noted as the instrumental variables in Fig. 2. \(\mathbf{h}^{*}(t)\) is given as follows:

$$\begin{aligned}&\mathbf{h}^{*}(t)=\left[ {{\begin{array}{c} {{\varvec{\uppsi }}_0^*(t)} \\ {{\varvec{\uppsi }}_1 (t)} \\ {{\varvec{\uppsi }}_2 (t)} \\ {\ldots } \\ {{\varvec{\uppsi }}_{n_c } (t)} \\ \end{array} }} \right] \in R^{n_0 },\nonumber \\&{\varvec{\uppsi }}_0^*(t)=\left[ {{\begin{array}{c} {-m(t-1)} \\ {-m(t-2)} \\ {\ldots } \\ {-m(t-n_a )} \\ \end{array} }} \right] \in R^{n_a } \end{aligned}$$
(22)
  1. 1.

    Adaptive filtering method

When \(u(t)\) is persistent excitation signal, then \(E\{\mathbf{h}^{*}(t)\mathbf{h}^{T}(t)\}\) is nonsingular. Since \(m(t)\) is only related to \(u(t)\), that is to say \(\mathbf{h}^{*}(t)\) must be uncorrelated with the noise, hence \(E\{\mathbf{h}^{*}(t)e(t)\}=0\).

The instrumental model can be regarded as an adaptive filter. The instrumental variables can be obtained by the following method:

$$\begin{aligned} m(t)=\mathbf{h}^{*T}(t)\hat{\varvec{\uptheta }}(t) \end{aligned}$$

or

$$\begin{aligned} \left\{ {\begin{array}{l} m(t)=\mathbf{h}^{*T}(t){\bar{\varvec{\uptheta }}}(t) \\ {\bar{\varvec{\uptheta }}}(t)=(1-\alpha ){\bar{\varvec{\uptheta }}}(t-1)+\alpha {\hat{\varvec{\uptheta }}}(t-d) \\ \end{array}} \right. \end{aligned}$$
(23)

where \(\alpha \in (0.01,0.1), d\in (0,10),{\hat{\varvec{\uptheta }}}(t)\) is the parameter estimation at moment \(t\) with instrumental variable method, it can be calculated by recursive method. If the two conditions (a) and (b) are satisfied, then Eqs. (23) and (24) are equivalent.

  1. 2.

    Tally principle

If noise \(e(t)\) can be regarded as following model:

$$\begin{aligned} e(t)=D\left( z^{-1}\right) v(t) \end{aligned}$$
(24)

where \(w(t)\) is uncorrelated stochastic noise with zero mean value and

$$\begin{aligned} D\left( z^{-1}\right) =1+d_1 z^{-1}+d_2 z^{-2}+\cdots +d_{n_d } z^{-n_d } \end{aligned}$$
(25)

The instrumental variables can be chosen as

$$\begin{aligned} m(t)=y(t-n_d ) \end{aligned}$$
(26)
  1. 3.

    Pure lag method

The instrumental model can be regarded as pure lag segment. The instrumental variables can be chosen as follows:

$$\begin{aligned} m(t)=u(t-n_b ) \end{aligned}$$
(27)

where \(n_b \) is the order of polynomial \(B(z^{-1})\). It is evident that if \(u(t)\) is persistent excitation signal and is unrelated to \(e(t)\), the instrumental variables will meet the two conditions above.

Detailed description about the three methods to choose instrumental variables can be found in [56].

4 The nonlinear recursive instrumental variables method

4.1 The nonlinear RIV algorithm

Following gives the derivation of the nonlinear RIV algorithm.

Replace \(\mathbf{h}(t)\) by \(\mathbf{h}^{*}(t)\) in Eq. (14) gives

$$\begin{aligned} \hat{\varvec{\uptheta }}_\mathrm{IV}&= (\mathbf{H}_L^{*T} \mathbf{H}_L )^{-1}\mathbf{H}_L^{*T} \mathbf Y _L \nonumber \\&= \left[ {\sum _{i=1}^L {\mathbf{h}^{*}(i)\mathbf{h}^{T}(i)} } \right] ^{-1}\left[ {\sum _{i=1}^L {\mathbf{h}^{*}(i)y(i)} } \right] \end{aligned}$$
(28)

Define

$$\begin{aligned} \left\{ {\begin{array}{ll} {\varvec{P}}^{-1}(t)&{}=\sum \limits _{i=1}^t {\mathbf{h}^{*}(i)\mathbf{h}^{T}(i)} \\ {\varvec{K}}(t)&{}={\varvec{P}}(t)\mathbf{h}^{*}(t) \\ \end{array}} \right. \end{aligned}$$
(29)

Then,

$$\begin{aligned} {\varvec{P}}^{-1}(t)&= \sum _{i=1}^{t-1} {\mathbf{h}^{*}(i)\mathbf{h}^{T}(i)+} \mathbf{h}^{*}(t)\mathbf{h}^{T}(t)\nonumber \\&= {\varvec{P}}^{-1}(t-1)+\mathbf{h}^{*}(t)\mathbf{h}^{T}(t) \end{aligned}$$
(30)
$$\begin{aligned} {\varvec{P}}^{-1}(t-1)\hat{\varvec{\uptheta }}_\mathrm{IV} (t-1)=\sum _{i=1}^{t-1} {\mathbf{h}^{*}(i)y(i)} \end{aligned}$$
(31)

Substitute Eqs. (30)–(31) into Eq. (28) gives

$$\begin{aligned}&\hat{\varvec{\uptheta }}_\mathrm{IV} (t)\;={\varvec{P}}(t)\left[ {\sum _{i=1}^t {\mathbf{h}^{*}(i)y(i)} } \right] \nonumber \\&\quad =\hat{\varvec{\uptheta }}_\mathrm{IV} (t-1)+{\varvec{P}}(t)\mathbf{h}^{*}(t)[y(t)-\mathbf{h}^{T}(t) \hat{\varvec{\uptheta }}_\mathrm{IV} (t-1)].\nonumber \\ \end{aligned}$$
(32)

Apply the inversion formula of matrix \(({\varvec{A}}+{\varvec{BC}})^{-1}={\varvec{A}}^{-1}-{\varvec{A}}^{-1}{\varvec{B}}({\varvec{I}}+ {\varvec{CA}}^{-1}{\varvec{B}})^{-1}{\varvec{CA}}^{-1}\) to (30) yields

$$\begin{aligned} {\varvec{P}}(t)&= \{{\varvec{I}}-{\varvec{P}}(t-1)\mathbf{h}^{*}(t)\mathbf{h}^{T}(t) [1\nonumber \\&\quad +\,\mathbf{h}^{T}(t){\varvec{P}}(t-1)\mathbf{h}^{*}(t)]^{-1}\}{\varvec{P}}(t-1) \end{aligned}$$
(33)

Substitute Eq. (33) into Eq. (29) gives

$$\begin{aligned} {\varvec{K}}(t)&= {\varvec{P}}(t)\mathbf{h}^{*}(t)\nonumber \\&= \left[ {1+\mathbf{h}^{T}(t) {\varvec{P}}(t-1)\mathbf{h}^{*}(t)} \right] ^{-1}{\varvec{P}}(t-1)\mathbf{h}^{*}(t)\nonumber \\ \end{aligned}$$
(34)

And finally the nonlinear RIV algorithm is obtained as follows:

$$\begin{aligned} \hat{\varvec{\uptheta }}(t)&= \hat{\varvec{\uptheta } }(t-1)+{\varvec{K}}(t)[y(t)-\mathbf{h}^{T}(t)\hat{\varvec{\uptheta }}(t-1)]\end{aligned}$$
(35)
$$\begin{aligned} {\varvec{K}}(t)&= {\varvec{P}}(t-1)\mathbf{h}^{*}(t)[1+\mathbf{h}^{T}(t) {\varvec{P}}(t-1)\mathbf{h}^{*}(t)]^{-1}\end{aligned}$$
(36)
$$\begin{aligned} {\varvec{P}}(t)&= [{\varvec{I}}-{\varvec{K}}(t)\mathbf{h}^{T}(t)]{\varvec{P}}(t-1) \end{aligned}$$
(37)

where \(\mathbf{h}^{*}(t)\) are the instrumental variables shown in Eq. (20)–(21), the choice of \(\mathbf{h}^{*}(t)\) can use the three methods described above. If the instrumental variables are in adaptive filtering form, it needs least squares method to calculate a few steps to obtain the initial parameter estimation \({\hat{\varvec{\uptheta }}}(t)\) as the initial state of the nonlinear RIV method, to initialize the algorithm, \(p_0 \) is taken as a large positive real number, e.g., \(p_0 =10^{6}{\varvec{I}}\), and \({\hat{\varvec{\uptheta }}}(0)=10^{-6}{\varvec{I}}_{n_0 \times 1} \).

4.2 Mean square convergence of the nonlinear RIV method

Firstly, two lemmas [57] are introduced.

Lemma 1

Assume the eigenvalues of matrix \(A\in R^{n\times n}\) are \(\lambda _i [A], i=1,2,\ldots ,n\), then the eigenvalues of matrix \(A+s{\varvec{I}}\) are \(\lambda _i [A+s{\varvec{I}}]=\lambda _i [A]+s,i=1,2,\ldots ,n, s\) is a constant

Lemma 2

Assume the eigenvalues of matrix \(A\in R^{n\times n}\) are \(\lambda _i [A], i=1,2,\ldots ,n, \mathop {\min }\nolimits _i \{\lambda _i [A]\}=\alpha ,\) then \(A^{T}A\ge \alpha ^{2}{\varvec{I}}\), \((A+s{\varvec{I}})^{T}(A+s{\varvec{I}})\ge (\alpha -s)^{2}{\varvec{I}}\), where \(0<s<\alpha \).

Theorem 3

Assume that \(\{e(t)\}\) is a random noise vector sequence with zero mean and bounded variance, namely \(E[||e(t)||^{2}]=\sigma _e^2 (t)\le \sigma ^{2}<\infty \); the input vectors \(\{u(t)\}\) and the instrumental vectors \(\{\mathbf{h}^{*}(t)\}\) are uncorrelated with \(\{e(t)\}\) and the system meets the weak persistence of excitation, which is to ensure that the matrix \(\frac{1}{t}\mathbf{H}_t^*\mathbf{H}_t \) is nonsingular, that is

$$\begin{aligned}&A_1 :\min \left\{ {\left| {\lambda _i \left[ {\frac{1}{t}\mathbf{H}_t^*\mathbf{H}_t } \right] } \right| } \right\} \nonumber \\&\quad =\min \left\{ {\left| {\lambda _i \left[ {\frac{1}{t}\sum _{i=1}^t {\mathbf{h}^{*}(i)\mathbf{h}(i)} } \right] } \right| } \right\} \ge \alpha >0,\;a.s. \nonumber \\&A_2:\frac{1}{t}\sum _{i=1}^t {\mathbf{h}^{*}(i)\mathbf{h}(i)} =\frac{1}{t}\mathbf{H}_t^*\mathbf{H}_t \le \beta {\varvec{I}}<\infty ,\;a.s. \end{aligned}$$

Define \(||X||^{2}=tr[XX^{T}]\).

Assume \(E[||\hat{\varvec{\uptheta }}(0)-\varvec{\uptheta } ||^{2}]\le M_0 <\infty \), and \(\hat{\varvec{\uptheta }}(0)\) is uncorrelated with \(\{e(t)\}\), then the parameter estimation error using the nonlinear RIV method converges to zero at the rate of \({O}(\frac{1}{\sqrt{t}})\), that is

$$\begin{aligned}&E[||\hat{\varvec{\uptheta }}(t)-\hat{\varvec{\uptheta }} ||^{2}]\le 2\left[ {\frac{||{\varvec{P}}^{-1}(0)||^{2}M_0 }{(\alpha t-a)^{2}}+\frac{n\beta \sigma ^{2}t}{(\alpha t-a)^{2}}} \right] \nonumber \\&\quad \triangleq f(t)\hbox { or }\,\, \mathop {\lim }\limits _{t\rightarrow \infty } E[||\hat{\varvec{\uptheta }}(t)-{\varvec{\uptheta }} ||]^{2}=0 \end{aligned}$$
(38)

where \(n\) is the rank of matrix \(\mathbf{H}_t^*{\varvec{P}}^{T}(t){\varvec{P}}(t)\mathbf{H}_t^{*^{T}}, \beta \) is the largest eigenvalue of matrix \(\mathbf{H}_t^*{\varvec{P}}^{T}(t){\varvec{P}}(t)\mathbf{H}_t^{*^{T}}, {\varvec{P}}(0)=P_0 =\frac{1}{a}I, 0<a<1\), say \(a=10^{-6}\).

Refer to [58, 59], the following gives the proof of Theorem 3.

Define

$$\begin{aligned} \tilde{\varvec{\uptheta } }(t)=\hat{\varvec{\uptheta }}(t)-\varvec{\uptheta } \end{aligned}$$
(39)

Combine Eqs. (30) and (35) gives

$$\begin{aligned} \tilde{\varvec{\uptheta } }(t)=[{\varvec{I}}-{\varvec{P}}(t)\mathbf{h}^{*}(t) \mathbf{h}^{T}(t)]\tilde{\varvec{\uptheta }}(t-1)+{\varvec{P}}(t)\mathbf{h}^{*}(t)e(t) \end{aligned}$$
(40)

Multiplying both sides of the Eq. (30) by \({\varvec{P}}^{-1}(t)\) gives

$$\begin{aligned} {\varvec{I}}-{\varvec{P}}(t)\mathbf{h}^{*}(t)\mathbf{h}^{T}(t)={\varvec{P}}(t){\varvec{P}}^{-1}(t-1) \end{aligned}$$
(41)

Substituting Eq. (41) in Eq. (40) gives

$$\begin{aligned} \tilde{\varvec{\uptheta } }(t)={\varvec{P}}(t){\varvec{P}}^{-1}(0)\tilde{\varvec{\uptheta } }(0)+{\varvec{P}}(t)\mathbf{H}_t^{*^{T}} e_t =\gamma _1 (t)+\gamma _2 (t) \end{aligned}$$
(42)

where \(\mathbf{H}_t^{*^{T}} e_t =\sum \nolimits _{i=1}^t {\mathbf{h}^{*}(t)e^{T}(t)} \), \({\varvec{P}}^{-1}(t)=\mathbf{H}_t^*\mathbf{H}_t +{\varvec{P}}^{-1}(0)\), \(\gamma _1 (t)={\varvec{P}}(t){\varvec{P}}^{-1}(0)\tilde{\varvec{\uptheta } }(0)\), \(\gamma _2 (t)={\varvec{P}}(t)\mathbf{H}_t^{*^{T}} e_t \).

Apply Lemmas 1 and 2, then for any \(t\rightarrow \infty \), it gives \({\varvec{P}}^{-T}(t)=(\mathbf{H}_t^*\mathbf{H}_t )^{T}+{\varvec{P}}^{-1}(0){\varvec{I}}\).

Define

$$\begin{aligned} \frac{1}{t}\mathbf{H}_t^*\mathbf{H}_t =A,{\varvec{P}}^{-1}(0)=s{\varvec{I}},\quad \mathop {\min }\limits _i \left\{ |\lambda _i [\mathbf{H}_t^*\mathbf{H}_t^T ]|\right\} =\alpha \end{aligned}$$
(43)

Then,

$$\begin{aligned}&\mathop {\min }\limits _i \left\{ {\left| {\lambda _i \left[ {\frac{1}{t}\mathbf{H}_t^*\mathbf{H}_t^T } \right] } \right| } \right\} =\alpha \end{aligned}$$
(44)
$$\begin{aligned}&\qquad \left[ {\frac{1}{t}{\varvec{P}}^{-T}(t)} \right] ^{T}\left[ {\frac{1}{t}{\varvec{P}}^{-T}(t)} \right] =(A+s{\varvec{I}})^{T}(A+s{\varvec{I}})\nonumber \\&\quad \ge \left( \alpha -\frac{1}{t}a\right) ^{2}\nonumber \\&\qquad {\varvec{P}}^{T}(t){\varvec{P}}(t)\le \frac{1}{(\alpha t-a)^{2}} \end{aligned}$$
(45)

Thus, it gives

$$\begin{aligned} 0&\le E[||\gamma _1 (t)||^{2}]\nonumber \\&= E\left[ \tilde{\varvec{\uptheta } }^{T}(0){\varvec{P}}^{-T}(0){\varvec{P}}^{T}(t){\varvec{P}}(t){\varvec{P}}^{-1}(0)\tilde{\varvec{\uptheta } }(0)\right] \nonumber \\&\le \frac{||{\varvec{P}}^{-1}(0)||^{2}M_0 }{(\alpha t-a)^{2}}\end{aligned}$$
(46)
$$\begin{aligned} 0&\le E[||\gamma _2 (t)||^{2}]=E\left[ e_t^T \mathbf{H}_t^*{\varvec{P}}^{T}(t){\varvec{P}}(t)\mathbf{H}_t^{*^{T}} e_t \right] \nonumber \\&\le \frac{n\beta \sigma ^{2}t}{(\alpha t-a)^{2}} \end{aligned}$$
(47)

Substitute Eq. (46)–(47) into Eq. (42)

$$\begin{aligned} 0\le E||\tilde{\varvec{\uptheta } }(t)||^{2}&= \hbox {E}||\gamma _1 (t)+\gamma _2 (t)||^{2}\le 2E\left[ ||\gamma _1 (t)||^{2}\right. \nonumber \\&\left. \quad +\,||\gamma _2 (t)||^{2}\right] \nonumber \\&= 2\left( {\frac{||{\varvec{P}}^{-1}(0)||^{2}M_0 }{(\alpha t-a)^{2}}+\frac{n\beta \sigma ^{2}t}{(\alpha t-a)^{2}}} \right) \nonumber \\&\quad \hbox {or} \mathop {\lim }\limits _{t\rightarrow \infty } E[||\hat{\varvec{\uptheta }}(t)-\varvec{\uptheta }||]^{2}=0 \end{aligned}$$
(48)

Then the mean square convergence of the nonlinear RIV method is completely proved.

From what is discussed above, it can be seen that \(e(t)\) is colored noise, but as long as the system is persistently excited, and the noise \(e(t)\) is zero mean and bounded variance and uncorrelated with \(\hat{\varvec{\uptheta }}(0)\), which means condition \(A_1\) and \(A_2 \) are satisfied, then the nonlinear RIV method has the property of mean square convergence for identification of Hammerstein models, that is to say the parameter estimation error \(\tilde{\varvec{\uptheta }}(t)\) using the proposed method converges to zero at the rate of \({O}(\frac{1}{\sqrt{t}})\), which guarantees that the nonlinear RIV method has good capability against colored noise.

5 Example

Due to the commonly recognized effectiveness of Ding’s RLS algorithm, Ding’s example [23] is hereby taken as the model to demonstrate the improved identification performance of the new algorithm. This is a Hammerstein ARMAX system as follows:

$$\begin{aligned} A(z)y(t)&= B(z)\bar{{u}}(t)+D(z)v(t)\end{aligned}$$
(49)
$$\begin{aligned} A(z)&= 1+a_1 z^{-1}+a_2 z^{-2}=1-1.60z^{-1}\nonumber \\&\quad +\,0.80z^{-2}\end{aligned}$$
(50)
$$\begin{aligned} B(z)&= b_1 z^{-1}+b_2 z^{-2}=0.85z^{-1}+0.65z^{-2}\nonumber \\\end{aligned}$$
(51)
$$\begin{aligned} D(z)&= 1+d_1 z^{-1}=1-0.64z^{-1}\end{aligned}$$
(52)
$$\begin{aligned} \bar{{u}}(t)&= f(u(t))=c_1 u(t)+c_2 u^{2}(t)+c_3 u^{3}(t) \nonumber \\&= u(t)+0.5u^{2}(t)+0.25u^{3}(t) \end{aligned}$$
(53)

\(\varvec{\uptheta } =[a_1 ,a_2 ,b_1 ,b_2 ,c_2 ,c_3 ]=[-1.60,0.8\), \(0.85,0.65, 0.5,0,25]\) are the parameters to be identified. \(\{u(t)\}\) is taken as a persistent excitation signal sequence with zero mean and unit variance, and \(\{v(t)\}\) as a white noise sequence with zero mean and constant variance \(\sigma _v^2 \). The noise-to-signal-ratio (NSR) is defined by the standard deviation of the ratio of input-free output and noise-free output, namely \(\mathrm{NSR}=\sqrt{\frac{\hbox {var}[v(t)]}{\hbox {var}[u(t)]}}\times 100\,\% \).

When \(\sigma _v^2 =0.3^{2}, \sigma _v^2 =0.5^{2}\) and \(\sigma _v^2 =0.7^{2}\), the corresponding NSRs are 16.34, 25.45 and 35.70 %, respectively. The accuracy of identification of the proposed models is assessed by comparing overall output response of estimated model and the true output, and also the relative parameter estimation error, which is

$$\begin{aligned} \delta =||\hat{\varvec{\uptheta }}(t)-\varvec{\uptheta } ||/||\varvec{\uptheta } ||\times 100\,\% \end{aligned}$$
(54)

For the model discussed above, the nonlinear RIV method is chosen to be compared with ARMAX-RLS [23] under three NSRs mentioned above and sampling data length 3,000 to get the results. The comparison of the relative parameter estimation errors between real parameters and identified parameters of the nonlinear RIV and ARMAX-RLS methods with NSR 16.34, 25.45 and 37.50 % is shown in Figs. 34 and 5 (dashed line for the errors by RLS method and solid line for the errors by the nonlinear RIV method), respectively. And Tables 13 and 5 are the records of the parameter estimation and the related relative error of nonlinear RIV method with NSR 16.34, 25.45 and 37.50 %, respectively. And Tables 24 and 6 keep record of the parameter estimation and the related relative error of ARMAX-RLS method with NSR 16.34, 25.45 and 37.50 %, respectively.

Fig. 3
figure 3

Comparison of relative error between the nonlinear RIV method and ARMAX-RLS method (NSR \(=\) 16.34 %)

Fig. 4
figure 4

Comparison of relative error between the nonlinear RIV method and ARMAX-RLS method (NSR \(=\) 25.45 %)

Fig. 5
figure 5

Comparison of relative error between the nonlinear RIV method and ARMAX-RLS method (NSR \(=\) 35.70 %)

Figure 3 shows the changing process of relative error between nonlinear RIV method and ARMAX-RLS method when NSR is 16.34 %. And the relative error is obtained through Eq. (54). When the sampling time is between 0 and 400, the identification error of ARMAX-RLS method is smaller than that of the nonlinear RIV method. However, the identification error of nonlinear RIV method shows decline trend and is lower than that of the ARMAX-RLS method afterward. This has shown that the advantage of the proposed method may not be obvious at first, but with the increase of the sampling time, the advantage becomes more and more obvious.

Table 1 Identification results of the nonlinear RIV method (NSR \(=\) 16.34 %)
Table 2 Identification results of the ARMAX-RLS method (NSR \(=\) 16.34%)

Tables 1 and 2 have shown that the concrete values of the parameters with 16.34 %. \(\theta _1 ,\theta _2 ,\theta _3 ,\theta _4 \), which can be directly obtained through Eqs. (35)–(37), represent the parameter \(a_1 ,a_2 ,b_1 ,b_2 \), respectively. \(\theta _5 ,\theta _6 ,\theta _7 ,\theta _8 \) are not the parameters of the real system but obtained by linear transform and they represent \(b_1 c_2 ,b_2 c_2 ,b_1 c_3 ,b_2 c_3 \), respectively. And \(c_2 ,c_3 \) can be obtained by the following method:

$$\begin{aligned} c_2&= (\theta _5 +\theta _6 )/(\theta _3 +\theta _4 ) \nonumber \\ c_3&= (\theta _7 +\theta _8 )/(\theta _3 +\theta _4 ) \end{aligned}$$
(55)

After iteration of 3,000, the final error of the nonlinear RIV method is 2.54 %, the final error of ARMAX-RLS method is 3.32 %.

The final parameter identification result is:

$$\begin{aligned}&{\hat{\varvec{\uptheta }}}_\mathrm{RIV} =[-1.5971\quad 0.7987\quad 0.8835\quad 0.6613\\&\quad 0.4975\quad 0.2224]\\&{\hat{\varvec{\uptheta }}}_{\mathrm{ARMAX}\text {-}\mathrm{RLS}} =[-1.6036\quad 0.8050\quad 0.8821\quad 0.6249\\&\quad 0.5012\quad 0.2392] \end{aligned}$$

Figure 4 depicts the changing process of relative error between predicted output and real output by the nonlinear RIV method and ARMAX-RLS method with 25.45 % NSR. From Fig. 4, it can be seen that the advantage of the nonlinear RIV method is still significant and the error curve of this method is sharply declining when the sampling time is about 300 and keeps under the error curve of the ARMAX-RLS method afterward.

Table 3 Identification results of the nonlinear RIV method (NSR \(=\) 25.45 %)
Table 4 Identification results of the ARMAX-RLS method (NSR \(=\) 25.45%)

Tables 3 and 4 have shown that the concrete values of the parameters with 25.45 %. After iteration of 3,000, the final error of the nonlinear RIV method is 3.73 %, the final error of ARMAX-RLS method is 5.51 %.

Based on Eq. (55), the final parameter identification result is:

$$\begin{aligned}&{\hat{\varvec{\uptheta }}}_{\mathrm{RIV}} =[-1.6030\quad 0.8022\quad 0.8957\quad 0.6428\\&\quad 0.4920\quad 0.2302]\\&{\hat{\varvec{\uptheta }}}_{\mathrm{ARMAX}\text {-}\mathrm{RLS}} =[-1.5965\quad 0.7973\quad 0.8901\\&\quad 0.6126 \quad 0.5140 \quad 0.2891]. \end{aligned}$$

Figure 5 depicts the changing process of relative error between predicted output and real output by nonlinear RIV method and ARMAX-RLS method when NSR is 35.70 %. From Fig. 5, it can be shown that the advantage of the proposed method is not significant at first. The overlapping of the two curves when the sampling time is between 0 and 700 indicates that the two methods show their advantages at different sampling times, and when the sampling time is over 700, the error curve of the nonlinear RIV method keeps under the error curve of the ARMAX-RLS method all through. The identification results of the nonlinear RIV method are all better than those of ARMAX-RLS method with the three different NSRs.

Tables 5 and 6 have shown that the concrete values of the parameters with 35.70 %. After iteration of 3,000, the final error of the nonlinear RIV method is 4.25 %, the final error of ARMAX-RLS method is 6.17 %.

Table 5 Identification results of the nonlinear RIV method (NSR \(=\) 35.70 %)
Table 6 Identification results of the ARMAX-RLS method (NSR \(=\) 35.70 %)

The final parameter identification result is:

$$\begin{aligned}&{\hat{\varvec{\uptheta }}}_\mathrm{RIV} =[-1.5948\quad 0.7951\quad 0.9036\quad 0.6498\\&\quad 0.4918\quad 0.1995]\\&{\hat{\varvec{\uptheta }}}_{\mathrm{ARMAX}\text {-}\mathrm{RLS}} =[-1.5943\quad 0.7961\quad 0.9224\quad \\&\quad 0.6530\quad 0.4865\quad 0.1907]. \end{aligned}$$

From the identification results above, the following conclusions can be obtained:

  1. (1)

    The proposed nonlinear RIV method for Hammerstein system identification is effective, and the identification errors with three NSRs are lower than that of ARMAX-RLS method after sampling time about 400, 300 and 700, respectively. From Figs. 34 and 5, we get the identification error curve of the nonlinear RIV method is basically under that of ARMAX-RLS method in the whole identification course. When NSR is 16.34 %, the identification error of the nonlinear RIV method is lower than that of ARMAX-RLS by 30.7 % after 3,000 iteration, when NSR is 25.45 %, the identification error of the proposed method is lower than that of ARMAX-RLS by 47.7 %. When NSR is 35.70 %, the identification error of the nonlinear RIV is lower than that of ARMAX-RLS by 45.2 %. So, the larger NSR is, the better advantage of the proposed method over ARMAX-RLS method is.

  2. (2)

    With the increase of the NSR, the deviation of identification error of predicted output with the real output is enlarging, which means the proposed method is applicable with small noise signals. When the noise signal is getting larger, the identification results are getting worse or the system may not be identified. But the nonlinear RIV method is superior to ARMAX-RLS method with noise interference.

  3. (3)

    As is shown in the error curves, at the beginning of identification, the parameter identification results converge very quickly; the cutoff point appears almost at 1,000 iterations. After 1,000 iterations, with the increase of iteration number, the speed of convergence is getting slower, and there exists some steady-state error between identification parameters and real parameters under different NSRs.

6 Conclusions

A nonlinear recursive instrumental variable method in identification of nonlinear system is obtained. It is applied to Hammerstein ARMAX model under different NSRs and is compared with ARMAX-RLS method in detail. The results show that the nonlinear RIV method is not only an effective method but also superior to ARMAX-RLS method in terms of identification accuracy and convergence speed especially under colored noise. ARMAX-RLS method is not applicable when noise model is unknown, while the proposed method is far more appropriate. In other words, the nonlinear RIV method is more flexible in identification of nonlinear systems with colored noise.

With the increase of the NSR, the deviation of identification error of predicted output with the real output is enlarging, which means the proposed method is applicable with small noise signals. When the noise signal is getting larger, the identification results are getting worse or the system may not be identified. But the nonlinear RIV method is superior to ARMAX-RLS method with noise interference.

The procedure of the proposed method and its mean square convergence for identification of Hammerstein models are also established, and its convergence analysis is worth further research.