1 Introduction

The parameter estimation theory and identification technique have been studied for decades [13] and have many applications in the field of networked control [4, 5], signal processing and filtering [6, 7], system modeling [810] and so on. In general, existing parameter identification methods can be roughly divided into two categories: the recursive algorithms [1113] which compute the parameter estimates by using new information at each step as time increases and the iterative algorithms [1416] which use a batch of data to update the parameter estimates. For example, Hu et al.  [17] proposed two recursive extended least-squares parameter estimation algorithms for Wiener nonlinear systems with moving average noises by means of the over-parameterization principle; Vörös  [18] presented a least squares-based iterative algorithm for three-block models with nonlinear static, linear dynamic and nonlinear dynamic blocks; Chen and Ding developed a hierarchical least-squares identification algorithm for Hammerstein nonlinear controlled autoregressive systems by using the hierarchical identification principle [19].

Finding an appropriate and simple model for a nonlinear system is a complex problem in the area of control. The typical nonlinear systems include the Wiener nonlinear system and the Hammerstein nonlinear system, which consist of a linear time-invariant block following (followed by) a memoryless nonlinear block [2022] and the identification problems of nonlinear systems are vibrant [2325]. A large amount of work has been published in this research field [2628]. Recently, Wang [29] presented a filtering and auxiliary model-based recursive least-squares algorithm and a filtering and auxiliary model-based least-squares iterative identification algorithm for Hammerstein nonlinear systems; Zhang [30] derived a recursive least-squares identification algorithm based on the bias compensation technique for multi-input single-output systems with colored noises; Ding et al. [31] proposed a recursive least-squares algorithm for estimating the parameters of the nonlinear systems based on the model decomposition.

The over-parameterization algorithms are common and useful for the identification of nonlinear systems [3234]. In [34], Ding and Chen transformed the nonlinear system into a pseudo-linear regressive identification model by the over-parameterization method and proposed an iterative least-squares algorithm and a recursive least-squares algorithm for Hammerstein nonlinear ARMAX systems. However, the drawback is that there exist redundant parameters and the over-parameterization-based least- squares algorithms have heavy computational cost. Motivated by the over-parameterization algorithms, this paper develops a multi-innovation algorithm using the key-term separation principle for Hammerstein multi-input multi-output output-error moving average (H-MIMO-OEMA) systems, i.e.,

$$\begin{aligned} \varvec{y}(t)= & {} \varvec{A}^{-1}(z)\varvec{B}(z){\bar{\varvec{u}}}(t)+\varvec{D}(z)\varvec{v}(t), \nonumber \\ \varvec{A}(z):= & {} \varvec{I}+\varvec{A}_1z^{-1}+\varvec{A}_2z^{-2}\nonumber \\&+\cdots +\varvec{A}_{n_a}z^{-n_a}\in \mathbb {R}^{m\times m},\ \varvec{A}_l=[a^{l}_{ij}]\nonumber \\&\in \mathbb {R}^{m\times m},\nonumber \\ \varvec{B}(z):= & {} \varvec{B}_0+\varvec{B}_1z^{-1}+\varvec{B}_2z^{-2}\nonumber \\&+\cdots +\varvec{B}_{n_b}z^{-n_b}\in \mathbb {R}^{m\times m},\ \varvec{B}_l=[b^{l}_{ij}]\nonumber \\&\in \mathbb {R}^{m\times m},\nonumber \\ \varvec{D}(z):= & {} \varvec{I}+\varvec{D}_1z^{-1}+\varvec{D}_2z^{-2}\nonumber \\&+\cdots +\varvec{D}_{n_d}z^{-n_d}\in \mathbb {R}^{m\times m},\ \varvec{D}_l=[d^{l}_{ij}]\nonumber \\&\in \mathbb {R}^{m\times m}, \end{aligned}$$
(1)

where \({\bar{\varvec{u}}}(t)=[{\bar{u}}_1(t), {\bar{u}}_2(t), \ldots , {\bar{u}}_m(t)]^{\mathrm{T}}\in \mathbb {R}^m\) is the output vector of the nonlinear part, \(\varvec{y}(t)\in \mathbb {R}^m\) is the output vector, \(\varvec{v}(t)\in \mathbb {R}^m\) is an additive noise with zero mean, \({\bar{u}}_i(t)\) is a linear combination with unknown coefficients \(c_{ij}\) of a known basis \((f_1, f_2, \ldots , f_{n_c})\), i.e.,

$$\begin{aligned} {\bar{u}}_i(t)= & {} c_{i1}f_1(u_i(t))+c_{i2}f_2(u_i(t))\nonumber \\&+\cdots +c_{in_c}f_{n_c}(u_i(t)). \end{aligned}$$
(2)

\(u_i(t)\) is the ith element of the system input vector \({\bar{\varvec{u}}}(t)\), \(\varvec{A}(z)\), \(\varvec{B}(z)\) and \(\varvec{D}(z)\) are polynomial matrices in the unit backward shift operator \(z^{-1}\): \(z^{-1}y(t)=y(t-1)\). Assume that the orders \(n_a\), \(n_b\), \(n_c\) and \(n_d\) are known and \(\varvec{y}(t)=\mathbf 0\), \(\varvec{u}(t)=\mathbf 0\) and \(\varvec{v}(t)=\mathbf{0}\) for \(t\leqslant 0\).

Recently, Shen and Ding [35] considered the identification problem of Hammerstein multi-input multi-output output-error (H-MIMO-OE) systems with white noise. Based on the work in [35], this paper extends the method in [35] from H-MIMO-OE systems to H-MIMO-OEMA systems with moving average noise, presents the multi-innovation extended stochastic gradient algorithm for H-MIMO-OEMA systems and analyzes the performance of the involved algorithm for different innovation length p. The main contributions of in this paper lie in the following.

  • For H-MIMO-OEMA systems, this paper derives a hierarchical extended stochastic gradient (H-ESG) algorithm. In order to improve convergence rates, a hierarchical multi-innovation extended stochastic gradient (H-MI-ESG) algorithm is presented by increasing the innovation length.

  • Different form the over-parameterization identification methods, the parameter vector/matrix in the proposed algorithm does not involve the products of the parameters between the linear parts and the nonlinear parts and has no redundant parameters to be estimated.

The rest of this paper is organized as follows. Section 2 introduces the identification problems for H-MIMO-OEMA systems using the key-term separation principle. Sections 3 and 4 discuss the hierarchical extended stochastic gradient identification algorithm and the hierarchical multi-innovation extended stochastic gradient algorithm for the H-MIMO-OEMA systems. Section 5 gives an illustrative example to show the effectiveness of the proposed algorithm. Finally, we offer some concluding remarks in Sect. 6.

2 Problem formulation

Let \(\varvec{I}\) represent an identity matrix of appropriate sizes; the norm of a matrix \(\varvec{X}\) is defined by \(\Vert \varvec{X}\Vert ^2:=\mathrm{tr}[\varvec{X}\varvec{X}^{\mathrm{T}}]\). Define the parameter vector \(\varvec{\theta }\) and the information matrix \(\varvec{\varPsi }(t)\) of the nonlinear part as

$$\begin{aligned}&\begin{aligned} \varvec{\theta }&:=[c_{11}, c_{12}, \ldots , c_{1n_c}, c_{21}, c_{22}, \ldots ,\\&\quad \,\,\, c_{2n_c}, \ldots , c_{m1}, c_{m2}, \ldots , c_{mn_c}]^{\mathrm{T}}\in \mathbb {R}^{mn_c}, \end{aligned}\\&\begin{aligned} \varvec{\varPsi }(t)&:=\mathrm{blockdiag}[\varvec{f}(u_1(t)), \varvec{f}(u_2(t)), \ldots ,\\&\quad \,\,\, \varvec{f}(u_m(t))]\in \mathbb {R}^{(mn_c)\times m}. \end{aligned} \end{aligned}$$

According to Eq. (2), the output vector \({\bar{\varvec{u}}}(t)\) of the nonlinear part can be expressed as

$$\begin{aligned} {\bar{\varvec{u}}}(t)=\varvec{\varPsi }^{\mathrm{T}}(t)\varvec{\theta }. \end{aligned}$$

Define an inner vector or the true output vector

$$\begin{aligned} \varvec{x}(t)= & {} \varvec{A}^{-1}(z)\varvec{B}(z){\bar{\varvec{u}}}(t) \nonumber \\= & {} [\varvec{I}-\varvec{A}(z)]\varvec{x}(t)+\varvec{B}(z){\bar{\varvec{u}}}(t)\nonumber \\= & {} [\varvec{I}-\varvec{A}(z)]\varvec{x}(t)+\varvec{B}_0\varvec{\varPsi }^{\mathrm{T}}(t)\varvec{\theta }\nonumber \\&+\,\varvec{B}_1\varvec{\varPsi }^{\mathrm{T}}(t-1)\varvec{\theta }\!+\!\cdots \!+\!\varvec{B}_{n_b}\varvec{\varPsi }^{\mathrm{T}}(t-n_b)\varvec{\theta }. \end{aligned}$$
(3)

According to Eq. (3), it is clear that there exist the products of parameters \(b_{ij}^l\) and \(c_{ij}\). Therefore, to obtain unique parameter estimates, we take \({\bar{\varvec{u}}}(t)\) as a key term, fix \(\varvec{B}_0=\varvec{I}\) and define the system parameter matrices \(\varvec{\vartheta }_s\) and \(\varvec{\vartheta }\) as

$$\begin{aligned} \varvec{\vartheta }_s^{\mathrm{T}}&:=[\varvec{A}_1, \varvec{A}_2, \ldots , \varvec{A}_{n_a}, \varvec{B}_1, \varvec{B}_2, \ldots , \varvec{B}_{n_b}]\nonumber \\&\quad \in \mathbb {R}^{m\times (mn_a+mn_b)}, \nonumber \\ \varvec{\vartheta }^{\mathrm{T}}&:=[\varvec{\vartheta }_s^{\mathrm{T}}, \varvec{D}_1, \varvec{D}_2, \ldots , \varvec{D}_{n_d}]\nonumber \\&\quad \in \mathbb {R}^{m\times mn},\ n:=n_a+n_b+n_d, \end{aligned}$$

and the information vectors \(\varvec{\phi }_s(t)\) and \(\varvec{\phi }(t)\) as

$$\begin{aligned} \varvec{\phi }_s(t)&:=[-\varvec{x}^{\mathrm{T}}(t-1), -\varvec{x}^{\mathrm{T}}(t-2), \ldots , \nonumber \\&\quad -\varvec{x}^{\mathrm{T}}(t-n_a), {\bar{\varvec{u}}}^{\mathrm{T}}(t-1), {\bar{\varvec{u}}}^{\mathrm{T}}(t-2), \ldots ,\nonumber \\&\quad {\bar{\varvec{u}}}^{\mathrm{T}}(t-n_b)]^{\mathrm{T}}\in \mathbb {R}^{(mn_a+mn_b)}, \nonumber \\ \varvec{\phi }(t)&:=[\varvec{\phi }_s^{\mathrm{T}}(t), \varvec{v}^{\mathrm{T}}(t-1), \varvec{v}^{\mathrm{T}}(t-2), \ldots ,\nonumber \\&\quad \varvec{v}^{\mathrm{T}}(t-n_d)]^{\mathrm{T}}\in \mathbb {R}^{mn}. \end{aligned}$$

Then, Eqs. (1) and (3) can be rewritten as

$$\begin{aligned} \varvec{x}(t)= & {} \varvec{\vartheta }_s^{\mathrm{T}}\varvec{\phi }_s(t)+{\bar{\varvec{u}}}(t), \end{aligned}$$
(4)
$$\begin{aligned} \varvec{y}(t)= & {} \varvec{x}(t)+[\varvec{D}(z)-\varvec{I}]\varvec{v}(t)+\varvec{v}(t) \nonumber \\= & {} \varvec{\vartheta }^{\mathrm{T}}\varvec{\phi }(t)+\varvec{\varPsi }^{\mathrm{T}}(t)\varvec{\theta }+\varvec{v}(t). \end{aligned}$$
(5)

Compared with the over-parameterization identification models in [34], the identification model in (4)–(5) does not involve the products of the parameters and avoids the redundant parameters to be estimated. Next, we investigate the hierarchical extended stochastic gradient identification algorithm and the hierarchical multi-innovation stochastic gradient algorithm for the H-MIMO-OEMA system.

3 The hierarchical extended stochastic gradient identification algorithm

According to Eq. (5), define two fictitious output vectors,

$$\begin{aligned}&\varvec{\xi }_1(t):=\varvec{y}(t)-\varvec{\varPsi }^{\mathrm{T}}(t)\varvec{\theta }\in \mathbb {R}^m, \nonumber \\&\varvec{\xi }_2(t):=\varvec{y}(t)-\varvec{\vartheta }^{\mathrm{T}}\varvec{\phi }(t)\in \mathbb {R}^m, \end{aligned}$$

and obtain two subsystems:

$$\begin{aligned} {\mathrm{S}}_1: \ \varvec{\xi }_1(t)= & {} \varvec{\vartheta }^{\mathrm{T}}\varvec{\phi }(t)+\varvec{v}(t), \nonumber \\ {\mathrm{S}}_2: \ \varvec{\xi }_2(t)= & {} \varvec{\varPsi }^{\mathrm{T}}(t)\varvec{\theta }+\varvec{v}(t). \end{aligned}$$

Notice that the associate parameters between two subsystems are \(c_{ij}\) in \(\varvec{\phi }(t)\) and \(\varvec{\theta }\). Define two cost functions:

$$\begin{aligned}&J_1(\varvec{\vartheta }):=\frac{1}{2}\Vert \varvec{\xi }_1(t)-\varvec{\vartheta }^{\mathrm{T}}\varvec{\phi }(t)\Vert ^2, \nonumber \\&J_2(\varvec{\theta }):=\frac{1}{2}\Vert \varvec{\xi }_2(t)-\varvec{\varPsi }^{\mathrm{T}}(t)\varvec{\theta }\Vert ^2. \end{aligned}$$

The difficulty is that \(\varvec{\phi }(t)\) contains the unknown vectors \(\varvec{x}(t-i)\), \({\bar{\varvec{u}}}(t-i)\) and \(\varvec{v}(t-i)\), and the output vectors \(\varvec{\xi }_1(t)\) and \(\varvec{\xi }_2(t)\) contain the unknown parameter matrix/vector \(\varvec{\vartheta }\) and \(\varvec{\theta }\), respectively. The solution is to apply the hierarchical identification principle and to replace the unknown vectors \(\varvec{x}(t-i)\), \({\bar{\varvec{u}}}(t-i)\) and \(\varvec{v}(t-i)\) with their corresponding estimates \({\hat{\varvec{x}}}(t-i)\), \(\hat{{\bar{\varvec{u}}}}(t-i)\) and \({\hat{\varvec{v}}}(t-i)\), and define the information vectors \({\hat{\varvec{\phi }}}_s(t)\) and \({\hat{\varvec{\phi }}}(t)\) as

$$\begin{aligned} {\hat{\varvec{\phi }}}_s(t)&:= [-{\hat{\varvec{x}}}^{\mathrm{T}}(t-1), -{\hat{\varvec{x}}}^{\mathrm{T}}(t-2), \ldots , -{\hat{\varvec{x}}}^{\mathrm{T}}(t-n_a), \nonumber \\&\quad \,\,\,\hat{{\bar{\varvec{u}}}}^{\mathrm{T}}(t-1), \hat{{\bar{\varvec{u}}}}^{\mathrm{T}}(t-2), \ldots ,\nonumber \\&\quad \,\,\, \hat{{\bar{\varvec{u}}}}^{\mathrm{T}}(t-n_b)]^{\mathrm{T}}\in \mathbb {R}^{(mn_a+mn_b)}, \end{aligned}$$
(6)
$$\begin{aligned} {\hat{\varvec{\phi }}}(t)&:=[{\hat{\varvec{\phi }}}_s^{\mathrm{T}}(t), {\hat{\varvec{v}}}^{\mathrm{T}}(t-1), {\hat{\varvec{v}}}^{\mathrm{T}}(t-2), \ldots ,\nonumber \\&\quad \,\,\,{\hat{\varvec{v}}}^{\mathrm{T}}(t-n_d)]^{\mathrm{T}}\in \mathbb {R}^{mn}. \end{aligned}$$
(7)

Using the estimate \({\hat{\varvec{\theta }}}(t)\) of \(\varvec{\theta }\), the estimate \(\hat{{\bar{\varvec{u}}}}(t)\) of \({\bar{\varvec{u}}}(t)\) and \({\hat{\varvec{\xi }}}_1(t)\) of \(\varvec{\xi }_1(t)\) can be computed by

$$\begin{aligned}&\hat{{\bar{\varvec{u}}}}(t)=\varvec{\varPsi }^{\mathrm{T}}(t){\hat{\varvec{\theta }}}(t), \end{aligned}$$
(8)
$$\begin{aligned}&{\hat{\varvec{\xi }}}_1(t)=\varvec{y}(t)-\varvec{\varPsi }^{\mathrm{T}}(t){\hat{\varvec{\theta }}}(t-1), \end{aligned}$$
(9)
$$\begin{aligned}&\varvec{\varPsi }(t)=\mathrm{blockdiag}[\varvec{f}(u_1(t)), \varvec{f}(u_2(t)), \ldots , \varvec{f}(u_m(t))], \end{aligned}$$
(10)
$$\begin{aligned} {\hat{\varvec{\theta }}}(t):= & {} [{\hat{c}}_{11}(t), {\hat{c}}_{12}(t), \ldots , {\hat{c}}_{1n_c}(t), {\hat{c}}_{21}(t),\nonumber \\&{\hat{c}}_{22}(t), \ldots , {\hat{c}}_{2n_c}(t), \ldots , \nonumber \\&{\hat{c}}_{m1}(t), {\hat{c}}_{m2}(t), \ldots , {\hat{c}}_{mn_c}(t)]^{\mathrm{T}}\in \mathbb {R}^{mn_c}. \end{aligned}$$
(11)

Replacing \(\varvec{\phi }_s(t)\), \({\bar{\varvec{u}}}(t)\) and \(\varvec{\vartheta }_s\) in (4) with their estimates \({\hat{\varvec{\phi }}}_s(t)\), \(\hat{{\bar{\varvec{u}}}}(t)\) and \({\hat{\varvec{\vartheta }}}_s(t)\), the estimate \({\hat{\varvec{x}}}(t)\) of \(\varvec{x}(t)\) can be computed by

$$\begin{aligned} {\hat{\varvec{x}}}(t)={\hat{\varvec{\vartheta }}}_s^{\mathrm{T}}(t){\hat{\varvec{\phi }}}_s(t)+\hat{{\bar{\varvec{u}}}}(t), \end{aligned}$$
(12)
$$\begin{aligned} {\hat{\varvec{\vartheta }}}_s^{\mathrm{T}}(t):= & {} [{\hat{\varvec{A}}}_1(t), {\hat{\varvec{A}}}_2(t), \ldots , {\hat{\varvec{A}}}_{n_a}(t), {\hat{\varvec{B}}}_1(t), {\hat{\varvec{B}}}_2(t), \ldots ,\nonumber \\&{\hat{\mathbf{B}}}_{n_b}(t)]\in \mathbb {R}^{m\times (mn_a+mn_b)}. \end{aligned}$$
(13)

Replacing \(\varvec{\vartheta }\), \(\varvec{\phi }(t)\) and \({\bar{\varvec{u}}}(t)\) in (5) with their estimates \({\hat{\varvec{\vartheta }}}(t)\), \({\hat{\varvec{\phi }}}(t)\) and \(\hat{{\bar{\varvec{u}}}}(t)\), the estimate \({\hat{\varvec{v}}}(t)\) of \(\varvec{v}(t)\) and \({\hat{\varvec{\xi }}}_2(t)\) of \(\varvec{\xi }_2(t)\) can be computed by

$$\begin{aligned}&{\hat{\varvec{v}}}(t)=\varvec{y}(t)-{\hat{\varvec{\vartheta }}}^{\mathrm{T}}(t){\hat{\varvec{\phi }}}(t)-\hat{{\bar{\varvec{u}}}}(t), \end{aligned}$$
(14)
$$\begin{aligned}&{\hat{\varvec{\xi }}}_2(t)=\varvec{y}(t)-{\hat{\varvec{\vartheta }}}^{\mathrm{T}}(t){\hat{\varvec{\phi }}}(t), \end{aligned}$$
(15)
$$\begin{aligned} {\hat{\varvec{\vartheta }}}^{\mathrm{T}}(t):= & {} [{\hat{\varvec{\vartheta }}}_s^{\mathrm{T}}(t), {\hat{\varvec{D}}}_1(t), {\hat{\varvec{D}}}_2(t), \ldots ,\nonumber \\&{\hat{\varvec{D}}}_{n_d}(t)]\in \mathbb {R}^{m\times mn}. \end{aligned}$$
(16)

Using the negative gradient search and minimizing the cost functions \(J_1(\varvec{\vartheta })\) and \(J_2(\varvec{\theta })\), we can obtain the ESG algorithm:

$$\begin{aligned}&{\hat{\varvec{\vartheta }}}(t)={\hat{\varvec{\vartheta }}}(t-1)+\frac{{\hat{\varvec{\phi }}}(t)}{r_1(t)}\varvec{e}^{\mathrm{T}}_1(t), \end{aligned}$$
(17)
$$\begin{aligned}&\varvec{e}_1(t)={\hat{\varvec{\xi }}}_1(t)-{\hat{\varvec{\vartheta }}}^{\mathrm{T}}(t-1){\hat{\varvec{\phi }}}(t), \end{aligned}$$
(18)
$$\begin{aligned}&r_1(t)=r_1(t-1)+\Vert {\hat{\varvec{\phi }}}(t)\Vert ^2,\ r_1(0)=1, \end{aligned}$$
(19)
$$\begin{aligned}&{\hat{\varvec{\theta }}}(t)={\hat{\varvec{\theta }}}(t-1)+\frac{\varvec{\varPsi }(t)}{r_2(t)}\varvec{e}_2(t), \end{aligned}$$
(20)
$$\begin{aligned}&\varvec{e}_2(t)={\hat{\varvec{\xi }}}_2(t)-\varvec{\varPsi }(t){\hat{\varvec{\theta }}}(t-1), \end{aligned}$$
(21)
$$\begin{aligned}&r_2(t)=r_2(t-1)+\Vert \varvec{\varPsi }(t)\Vert ^2,\ r_2(0)=1. \end{aligned}$$
(22)

Here, \(\varvec{e}_1(t)\in \mathbb {R}^m\) and \(\varvec{e}_2(t)\in \mathbb {R}^m\) represent innovation vectors and each element of \(\varvec{e}_1(t)\) and \(\varvec{e}_2(t)\) is a scalar innovation at the current time. Equations (6)–(22) form the hierarchical extended stochastic gradient (H-ESG) identification algorithm for the H-MIMO-OEMA system. The steps of computing the estimates \({\hat{\varvec{\vartheta }}}(t)\) and \({\hat{\varvec{\theta }}}(t)\) by the H-ESG algorithm are as follows.

  1. 1.

    Let \(t=1\), set the initial values \({\hat{\varvec{\vartheta }}}(0)=\mathbf{1}_{m\times (mn)}/p_0\), \({\hat{\varvec{\theta }}}(0)=\mathbf{1}_{mn_c}/p_0\), \({\hat{\varvec{x}}}(0)=\mathbf{0}\), \(\hat{{\bar{\varvec{u}}}}(0)=\mathbf{0}\), \({\hat{\varvec{v}}}(0)=\mathbf{0}\) for \(t\leqslant 0\) (\(p_0\) is a large number, e.g., \(p_0=10^6\)).

  2. 2.

    Collect the input–output data \(u_i(t)\) and \(y_i(t)\), and form \(\varvec{\varPsi }(t)\) using (10).

  3. 3.

    Form \({\hat{\varvec{\phi }}}_s(t)\) using (6) and \({\hat{\varvec{\phi }}}(t)\) using (7).

  4. 4.

    Compute \({\hat{\varvec{\xi }}}_1(t)\), \(\varvec{e}_1(t)\) and \(r_1(t)\) using (9), (18) and (19), and update \({\hat{\varvec{\vartheta }}}(t)\) using (17).

  5. 5.

    Compute \({\hat{\varvec{\xi }}}_2(t)\), \(\varvec{e}_1(t)\) and \(r_2(t)\) using (15), (21) and (22), and update \({\hat{\varvec{\theta }}}(t)\) using (20).

  6. 6.

    Compute \(\hat{{\bar{\varvec{u}}}}(t)\) using (8), \({\hat{\varvec{x}}}(t)\) using (12) and \({\hat{\varvec{v}}}(t)\) using (14).

  7. 7.

    Increase t by 1 and go to Step 2.

4 The hierarchical multi-innovation extended stochastic gradient algorithm

The H-ESG algorithm only uses the current data and thus has slow convergence rates. In order to enhance the convergence rate, we derive an H-MI-ESG algorithm with computational efficiency and high accuracy for subsystems \({\mathrm{S}}_1\) and \({\mathrm{S}}_2\) by expanding the single-innovation vectors \(\varvec{e}_1(t)\) and \(\varvec{e}_2(t)\) to an innovation matrix \(\varvec{E}_1(p, t)\) and an innovation vector \(\varvec{E}_2(p, t)\) as

$$\begin{aligned}&\varvec{E}_1(p,t)\nonumber \\&\quad :=\left[ \begin{array}{c} {\hat{\varvec{\xi }}}^{\mathrm{T}}_1(t)-{\hat{\varvec{\phi }}}^{\mathrm{T}}(t){\hat{\varvec{\vartheta }}}(t-1) \\ {\hat{\varvec{\xi }}}^{\mathrm{T}}_1(t-1)-{\hat{\varvec{\phi }}}^{\mathrm{T}}(t-1){\hat{\varvec{\vartheta }}}(t-1) \\ \vdots \\ {\hat{\varvec{\xi }}}^{\mathrm{T}}_1(t-p+1)-{\hat{\varvec{\phi }}}^{\mathrm{T}}(t-p+1){\hat{\varvec{\vartheta }}}(t-1) \end{array}\right] \nonumber \\&\qquad \in \mathbb {R}^{p\times m}, \end{aligned}$$
(23)
$$\begin{aligned}&\varvec{E}_2(p,t)\nonumber \\&\quad :=\left[ \begin{array}{c} {\hat{\varvec{\xi }}}_2(t)-\varvec{\varPsi }^{\mathrm{T}}(t){\hat{\varvec{\theta }}}(t-1) \\ {\hat{\varvec{\xi }}}_2(t-1)-\varvec{\varPsi }^{\mathrm{T}}(t-1){\hat{\varvec{\theta }}}(t-1) \\ \vdots \\ {\hat{\varvec{\xi }}}_2(t-p+1)-\varvec{\varPsi }^{\mathrm{T}}(t-p+1){\hat{\varvec{\theta }}}(t-1) \end{array}\right] \nonumber \\&\qquad \in \mathbb {R}^{mp}, \end{aligned}$$
(24)

where p represents the innovation length.

Define the information matrices \(\varvec{\varPhi }(p,t)\) and \(\varvec{\varOmega }(p,t)\) and the stacking output matrix/vector \(\varvec{Y}_1(p,t)\) and \(\varvec{Y}_2(p,t)\) as

$$\begin{aligned} \varvec{\varPhi }(p,t)&:=[\varvec{\phi }(t), \varvec{\phi }(t-1), \ldots , \nonumber \\&\quad \varvec{\phi }(t-p+1)]\in \mathbb {R}^{mn\times p}, \nonumber \\ \varvec{\varOmega }(p,t)&:=[\varvec{\varPsi }(t), \varvec{\varPsi }(t-1), \ldots , \nonumber \\&\quad \varvec{\varPsi }(t-p+1)]\in \mathbb {R}^{(mn_c)\times (mp)}, \nonumber \\ \varvec{Y}_1(p,t)&:=[\varvec{\xi }_1(t), \varvec{\xi }_1(t-1), \ldots , \nonumber \\&\quad \varvec{\xi }_1(t-p+1)]^{\mathrm{T}}\in \mathbb {R}^{p\times m}, \nonumber \\ \varvec{Y}_2(p,t)&:=[\varvec{\xi }^{\mathrm{T}}_2(t), \varvec{\xi }^{\mathrm{T}}_2(t-1), \ldots , \nonumber \\&\quad \varvec{\xi }^{\mathrm{T}}_2(t-p+1)]^{\mathrm{T}}\in \mathbb {R}^{mp}. \end{aligned}$$
(25)

The similar difficulties arise that \(\varvec{\varPhi }(p, t)\) contains the unknown vectors \(\varvec{\phi }(t)\), \(\varvec{Y}_1(p, t)\) and \(\varvec{Y}_2(p, t)\) contain the unknown fictitious output vectors \(\varvec{\xi }_1(t)\) and \(\varvec{\xi }_2(t)\), respectively. Our approach is to replace the unmeasurable inner vectors \(\varvec{\phi }(t-i)\), \(\varvec{\xi }_1(t-i)\) and \(\varvec{\xi }_2(t-i)\) with their estimates \({\hat{\varvec{\phi }}}(t-i)\), \({\hat{\varvec{\xi }}}_1(t-i)\) and \({\hat{\varvec{\xi }}}_2(t-i)\), and define

$$\begin{aligned} {\hat{\varvec{\varPhi }}}(p,t)&:=[{\hat{\varvec{\phi }}}(t), {\hat{\varvec{\phi }}}(t-1), \ldots ,\nonumber \\&\quad {\hat{\varvec{\phi }}}(t-p+1)]\in \mathbb {R}^{(mn)\times p}, \end{aligned}$$
(26)
$$\begin{aligned} {\hat{\varvec{Y}}}_1(p,t)&:=[{\hat{\varvec{\xi }}}_1(t), {\hat{\varvec{\xi }}}_1(t-1),\ldots ,\nonumber \\&\quad {\hat{\varvec{\xi }}}_1(t-p+1)]^{\mathrm{T}}\in \mathbb {R}^{p\times m}, \end{aligned}$$
(27)
$$\begin{aligned} {\hat{\varvec{Y}}}_2(p,t)&:=[{\hat{\varvec{\xi }}}^{\mathrm{T}}_2(t), {\hat{\varvec{\xi }}}^{\mathrm{T}}_2(t-1), \ldots , \nonumber \\&\quad {\hat{\varvec{\xi }}}^{\mathrm{T}}_2(t-p+1)]^{\mathrm{T}}\in \mathbb {R}^{mp}. \end{aligned}$$
(28)

Referring to [36] and according to the ESG algorithm in (17)–(22) and Eqs. (23) and (24), the MI-ESG algorithm with the innovation length p can be expressed as

$$\begin{aligned}&{\hat{\varvec{\vartheta }}}(t)={\hat{\varvec{\vartheta }}}(t-1)+\frac{{\hat{\varvec{\varPhi }}}(p,t)}{r_1(t)}\varvec{E}_1(p,t),\end{aligned}$$
(29)
$$\begin{aligned}&\varvec{E}_1(p,t)=\hat{\varvec{Y}_1}(p, t)-{\hat{\varvec{\varPhi }}}^{\mathrm{T}}(p,t){\hat{\varvec{\vartheta }}}(t-1), \end{aligned}$$
(30)
$$\begin{aligned}&r_1(t)=r_1(t-1)+\Vert {\hat{\varvec{\phi }}}(t)\Vert ^2,\ r_1(0)=1, \end{aligned}$$
(31)
$$\begin{aligned}&{\hat{\varvec{\theta }}}(t)={\hat{\varvec{\theta }}}(t-1)+\frac{\varvec{\varOmega }(p,t)}{r_2(t)}\varvec{E}_2(p,t),\end{aligned}$$
(32)
$$\begin{aligned}&\varvec{E}_2(p,t)=\hat{\varvec{Y}_2}(p, t)-\varvec{\varOmega }^{\mathrm{T}}(p,t){\hat{\varvec{\theta }}}(t-1), \end{aligned}$$
(33)
$$\begin{aligned}&r_2(t)=r_2(t-1)+\Vert \varvec{\varPsi }(t)\Vert ^2,\ r_2(0)=1. \end{aligned}$$
(34)

Equations (6)–(16) and (25)–(34) form the hierarchical multi-innovation extended stochastic gradient (H-MI-ESG) algorithm for computing \({\hat{\varvec{\vartheta }}}(t)\) and \({\hat{\varvec{\theta }}}(t)\). The procedures of computing \({\hat{\varvec{\vartheta }}}(t)\) and \({\hat{\varvec{\theta }}}(t)\) in the H-MI-ESG algorithm are listed in the following.

  1. 1.

    Let \(t=1\), set the initial values \({\hat{\varvec{\vartheta }}}(0)=\mathbf{1}_{m\times (mn)}/p_0\), \({\hat{\varvec{\theta }}}(0)=\mathbf{1}_{mn_c}/p_0\), \({\hat{\varvec{x}}}(0)=\mathbf{0}\), \(\hat{{\bar{\varvec{u}}}}(0)=\mathbf{0}\), \({\hat{\varvec{v}}}(0)=\mathbf{0}\) for \(t\leqslant 0\) (\(p_0\) is a large number, e.g., \(p_0=10^6\)).

  2. 2.

    Collect the input–output data \(u_i(t)\) and \(y_i(t)\), form \({\hat{\varvec{\phi }}}_s(t)\) using (6), \({\hat{\varvec{\phi }}}(t)\) using (7), \({\hat{\varvec{\varPhi }}}(p, t)\) using (26), \(\varvec{\varPsi }(t)\) using (10) and \(\varvec{\varOmega }(p, t)\) using (25).

  3. 3.

    Compute \({\hat{\varvec{\xi }}}_1(t)\) using (9) and form \({\hat{\varvec{Y}}}_1(p, t)\) using (27), compute \(\varvec{E}_1(p,t)\) and \(r_1(t)\) using (30) and (31) and update \({\hat{\varvec{\vartheta }}}(t)\) using (29).

  4. 4.

    Compute \({\hat{\varvec{\xi }}}_2(t)\) using (15) and form \({\hat{\varvec{Y}}}_2(p, t)\) using (28), compute \(\varvec{E}_2(p,t)\) and \(r_2(t)\) using (33) and (34) and update \({\hat{\varvec{\theta }}}(t)\) using (32).

  5. 5.

    Compute \(\hat{{\bar{\varvec{u}}}}(t)\) using (8), \({\hat{\varvec{x}}}(t)\) using (12) and \({\hat{\varvec{v}}}(t)\) using (14).

  6. 6.

    Increase t by 1 and go to Step 2.

The flowchart of computing the parameter estimates \({\hat{\varvec{\theta }}}(t)\) and \({\hat{\varvec{\vartheta }}}(t)\) in the H-MI-ESG algorithm is shown in Fig. 1.

Fig. 1
figure 1

The flowchart of computing the estimates \({\hat{\varvec{\theta }}}(t)\) and \({\hat{\varvec{\vartheta }}}(t)\)

Table 1 The H-ESG estimates and errors

Moreover, in order to improve the convergence rate of the H-MI-ESG algorithm further, we can introduce a forgetting factor \(\lambda \), so Eqs. (31) and (34) can be rewritten as

$$\begin{aligned} r_1(t)= & {} \lambda r_1(t-1)+\Vert {\hat{\varvec{\phi }}}(t)\Vert ^2,\ r_1(0)=1,\ 0<\lambda \leqslant 1, \nonumber \\ r_2(t)= & {} \lambda r_2(t-1)+\Vert \varvec{\varPsi }(t)\Vert ^2,\ r_2(0)=1. \end{aligned}$$

As \(p=1\), the H-MI-ESG algorithm reduces to the H-ESG algorithm in (6)–(22).

Table 2 The H-MI-ESG estimates and errors with \(p=5\)
Fig. 2
figure 2

The H-MI-ESG estimation errors \(\delta \) versus t

5 Example

Consider the following H-MIMO-OEMA system,

where \(f_1(u_i(t))=u_i^2(t)\) and \(f_2(u_i(t))=u_i(t)\), the parameter matrix \(\varvec{\vartheta }\) and the parameter vector \(\varvec{\theta }\) are

$$\begin{aligned}&\begin{aligned} \varvec{\vartheta }^{\mathrm{T}}&=[\varvec{A}_1, \varvec{B}_1, \varvec{D}_1]\nonumber \\&= \left[ \begin{array}{rrrrrr} 0.49 &{} -0.10 &{} 1.26 &{} 2.79 &{} -0.20 &{} 0 \\ 0.35 &{} -0.80 &{} 0.79 &{} 1.65 &{} 0 &{} 0.39 \end{array}\right] , \nonumber \\ \end{aligned}\\&\varvec{\theta }=[c_{11}, c_{12}, c_{21}, c_{22}]^{\mathrm{T}}\!=\![-2.21, 1.21, 0.76, 1.29]^{\mathrm{T}}. \end{aligned}$$

The input vector \(\varvec{u}(t)\) is taken as an uncorrelated persistent excitation signal vector with zero mean and unit variances, and \(\varvec{v}(t)\) as a white noise vector with zero mean and variances \(\sigma _1^2=0.10^2\) for \(v_1(t)\) and \(\sigma _2^2=0.10^2\) for \(v_2(t)\). Taking the forgetting factor \(\lambda =0.99\) and applying the H-MI-ESG algorithm to estimate parameters of the example system, the parameter estimates, and the estimation errors \(\delta \) are shown in Tables 1, 2 with the innovation lengths \(p=1\) and \(p=5\) and the estimation errors versus t are shown in Fig. 2, where

$$\begin{aligned} \delta :=\sqrt{\frac{\Vert {\hat{\varvec{\vartheta }}}(t)-\varvec{\vartheta }\Vert ^2+\Vert {\hat{\varvec{\theta }}}(t)-\varvec{\theta }\Vert ^2}{\Vert \varvec{\vartheta }\Vert ^2+\Vert \varvec{\theta }\Vert ^2}}\times 100\,\%. \end{aligned}$$

The computational efficiency of the H-MI-ESG algorithm is shown in Table 3.

Table 3 The computational efficiency of the H-MI-ESG algorithm

From Tables 1, 2 and Fig. 2, we can draw the following conclusions.

  • The H-MI-ESG algorithm has faster convergence rates than the H-ESG algorithm.

  • The H-MI-ESG estimation errors become smaller with the innovation length p increasing, and the parameter estimates given by the H-MI-ESG algorithm can converge to their true values with the increasing of t.

From Table 3, it is clear that although a larger p leads to a model with high accuracy, the price we have to pay is a large computational effort. Moreover, the proposed algorithm can reduce computational load by decomposing a large-scale system into two subsystems and avoid the complexity of identifying the products of parameters in contrast to the over-parameterization methods.

6 Conclusions

This paper uses the key term separation principle to study identification problems of the H-MIMO-OEMA system and proposes a hierarchical multi-innovation stochastic gradient algorithm with a forgetting factor. By expanding the single-innovation vectors, the proposed H-MI-ESG algorithm has higher estimation accuracy than the H-ESG algorithm. And separating a proper key term from the nonlinear system, we can obtain a pseudo-linear model which avoids the redundant parameters. So the proposed algorithm has high computational efficiency. However, this algorithm cannot be used to identify the nonlinear systems with unknown time delay. The proposed algorithm in this paper can be extended to study identification algorithms for other linear and nonlinear systems with colored noises [3739].