1 Introduction

Nonlinear time series models can reveal nonlinear features of many practical processes, and they are widely used in finance, ecology and some other fields [1]. The exponential autoregressive (ExpAR) model [2] is a significant kind of nonlinear time series models. In the early days, the ExpAR model is applied to the statistical analysis of the Canadian lynx data [3, 4], and then it shows the appropriateness in describing certain nonlinear behaviors, such as amplitude-dependent frequency, jump phenomena and limit cycle, and in conducting accurate multistep-ahead predictions [5]. In recent years, a good deal of publications are devoted to studying the stationarity, estimation and application of the ExpAR model. For example, Chen et al. discussed the stationary conditions of several generalized ExpAR models, developed a variable projection based estimation algorithm, and adopted the generalized ExpAR models to model and predict the monthly mean thickness ozone column [6].

Analyzing and controlling a nonlinear time series process relies on an appropriate dynamical model. System identification is a common tool to construct the mathematical models of dynamical systems, parameter estimation is generating the unknown system parameters via a set of observations. System identification and parameter estimation are widely used in many areas [7,8,9]. Many identification methods such as the maximum likelihood [10], the genetic algorithm [11], the blind identification [12] and the subspace identification [13] have been developed for decades. The gradient-based methods are a class of fundamental system identification methods. Combining with recursive and iterative techniques, the gradient-based methods can be provided for identifying many kinds of systems. However, the gradient-based methods have poor parameter estimation accuracies. By introducing the forgetting factor, some variants of the gradient-based identification algorithms are derived, which have improved parameter estimation accuracies. For instance, Chen and Jiang developed a gradient-based identification method with several forgetting factors for nonlinear two-variable difference systems [14].

In the area of system identification, many techniques have been exploited to improve the identification results. For example, the hierarchical identification has been developed as a significant branch of system identification [15]. Recently, a hierarchical gradient-based iterative algorithm was used to simultaneously estimate the unknown amplitudes and angular frequencies of multi-frequency signals [16]. In addition, the multi-innovation identification has shown the effectiveness in nonlinear system identification [17]. By expanding a scalar innovation into a multi-dimensional vector, a multi-innovation stochastic gradient (SG) algorithm was derived for Wiener–Hammerstein systems with backlash [18]; a multi-innovation fractional order SG algorithm was developed for Hammerstein nonlinear ARMAX systems [19]. However, there is few research on the nonlinear time series model identification using these novel identification techniques.

This communique investigates the recursive identification algorithms for the ExpAR model. Applying the hierarchical identification principle, the ExpAR model is decomposed into two sub-identification (Sub-ID) models, one of which contains the unknown parameter vector of the linear subsystem, and the other contains the unknown parameter of the nonlinear part. With the negative gradient search, two unknown parameter sets are estimated interactively. In order to make the most of the information, the scalar innovations are expanded into innovation vectors. Moreover, two forgetting factors are introduced into the multi-innovation algorithm, so that we can present a new recursive identification algorithm with improved parameter estimation accuracy. In brief, we list the following contributions provided in this paper.

  • Considering the difficulty of the nonlinear optimal problem arising in identifying the ExpAR model, we combine the hierarchical identification principle with the negative gradient search so as to derive a hierarchical stochastic gradient (H-SG) algorithm for the ExpAR model.

  • Using the multi-innovation identification theory, a hierarchical multi-innovation stochastic gradient (H-MISG) algorithm is presented for the ExpAR model. Introducing two forgetting factors, we obtain a modified H-MISG algorithm.

  • Comparing the parameter estimation accuracies of the proposed hierarchical algorithms, we find that the modified version of the H-MISG algorithm has improved parameter estimation accuracy and can be effectively used to identify the ExpAR model.

2 Problem description

Some notations used throughout this paper are first introduced in Table 1.

Table 1 The notations used throughout this paper

Given a time series \(\{x_k,x_{k-1},x_{k-2},\ldots \}\), an ExpAR model can be expressed as

$$\begin{aligned} x_k= & {} \left( \alpha _1+\beta _1\mathrm{e}^{-\xi x^2_{k-1}}\right) x_{k-1}\nonumber \\&+\,\left( \alpha _2+\beta _2\mathrm{e}^{-\xi x^2_{k-1}}\right) x_{k-2}+\cdots \nonumber \\&+\,\left( \alpha _n+\beta _n\mathrm{e}^{-\xi x^2_{k-1}}\right) x_{k-n}+\varepsilon _k, \end{aligned}$$
(1)

where \(\varepsilon _k\) is a white noise with zero mean, n denotes the system degree, \(\alpha _i\), \(\beta _i\) and \(\xi \) are the model parameters to be estimated.

When the parameters \(\beta _i=0\), \(i=1,2,\ldots ,n\), Eq. (1) reduces to an autoregressive (AR) model which has no nonlinear dynamics.

The form in (1) is the classic ExpAR model, some modified versions have been presented. For instance, in order to give a more sophisticated specification of the dynamics of the characteristic roots of AR models, Ozaki derived a variant of the ExpAR model in [3] using the Hermite type polynomials:

$$\begin{aligned} x_k=\sum \limits _{i=1}^{n}\Big [\alpha _i +\Big (\beta _{i0}+\sum \limits _{j=1}^{m_i} \beta _{ij}x^j_{k-1}\Big )\mathrm{e}^{-\xi x^2_{k-1}}\Big ]x_{k-i}+\varepsilon _k. \end{aligned}$$

Introducing a time-delay d and a scalar parameter \(\zeta \), Teräsvirta developed a different variant of the ExpAR model in [4]:

$$\begin{aligned} x_k= & {} \left[ \alpha _0+\beta _0\mathrm{e}^{-\xi (x_{k-d}-\zeta )^2}\right] \\&+\, \sum \limits _{i=1}^{n}\left[ \alpha _i+\beta _i\mathrm{e}^{-\xi (x_{k-d} -\zeta )^2}\right] x_{k-i}+\varepsilon _k. \end{aligned}$$

Some other generalized ExpAR models were summarized in [6]. After parametrization, we can derive the corresponding identification models, which have different parameter and information vectors, for the ExpAR family. This paper copes with the recursive identification for the classic ExpAR model. The proposed hierarchical algorithms are also appropriate for other ExpAR models.

Assume that the degree n is known, the data \(x_k\) is measurable. The initial values are taken as \(x_k=0\) and \(\varepsilon _k=0\) for \(t\le 0\).

Fig. 1
figure 1

The hierarchical structure of the identification models for the ExpAR model

It is obvious that \(x_k\) is linear with respect to the parameters \(\alpha _i\) and \(\beta _i\), and is nonlinear with respect to the parameter \(\xi \). Define the parameter vectors of the linear subsystem

$$\begin{aligned} \varvec{\alpha }:=\,&[\alpha _1,\alpha _2,\ldots ,\alpha _n]^{\tiny \mathrm{T}}\in \mathbb {R}^n,\\ \varvec{\beta }:=\,&[\beta _1,\beta _2,\ldots ,\beta _n]^{\tiny \mathrm{T}}\in \mathbb {R}^n, \end{aligned}$$

and the information vector

$$\begin{aligned} \varvec{X}_k := [x_{k-1},x_{k-2},\ldots ,x_{k-n}]^{\tiny \mathrm{T}}\in \mathbb {R}^n. \end{aligned}$$

Then, Eq. (1) can be transformed into

$$\begin{aligned} x_k= & {} \sum \limits _{i=1}^{n}\alpha _ix_{k-i}+\mathrm{e}^{-\xi x^2_{k-1}}\sum \limits _{i=1}^{n}\beta _ix_{k-i}+\varepsilon _k\nonumber \\= & {} \varvec{X}^{\tiny \mathrm{T}}_k\varvec{\alpha }+\mathrm{e}^{-\xi x^2_{k-1}}\varvec{X}^{\tiny \mathrm{T}}_k\varvec{\beta }+\varepsilon _k. \end{aligned}$$
(2)

Furthermore, define the following vectors:

$$\begin{aligned}&\varvec{\varTheta }:=\, [\varvec{\alpha }^{\tiny \mathrm{T}},\varvec{\beta }^{\tiny \mathrm{T}}]^{\tiny \mathrm{T}}\in \mathbb {R}^{2n},\\&\varvec{\phi }(\xi ,k) :=\, [\varvec{X}^{\tiny \mathrm{T}}_k,\mathrm{e}^{-\xi x^2_{k-1}}\varvec{X}^{\tiny \mathrm{T}}_k]^{\tiny \mathrm{T}}\in \mathbb {R}^{2n}. \end{aligned}$$

Then, Eq. (2) can be equivalently transformed into the identification model

$$\begin{aligned} x_k=\varvec{\phi }^{\tiny \mathrm{T}}(\xi ,k)\varvec{\varTheta }+\varepsilon _k. \end{aligned}$$
(3)

Since the unknown parameter of the nonlinear subsystem \(\xi \) exists in \(\varvec{\phi }(\xi ,k)\), the identification problem becomes a complex nonlinear optimization problem and the least-squares method cannot be used for parameter estimation. The previous work aims to explore new recursive identification methods for the ExpAR model.

3 The hierarchical stochastic gradient algorithm

Hierarchical identification is the decomposition based identification. The key idea is to decompose the identification model into several subsystems, such that the scale of the optimization problem becomes small [20]. In this section, by the hierarchical identification principle, the ExpAR model is decomposed into two subsystems, one of which contains \(\varvec{\varTheta }\), and the other contains \(\xi \), both these two parameter sets are to be estimated. In addition, the negative gradient search is widely adopted to deal with some optimization problems and to determine the extreme point of the objective function. Applying the negative gradient search, an H-SG algorithm is proposed for the ExpAR model.

Define the information item \(\psi (\varvec{\beta })\) and the intermediate variable \(x_{1,k}\) as

$$\begin{aligned}&\psi (\varvec{\beta }) :=\, \varvec{X}^{\tiny \mathrm{T}}_k\varvec{\beta }\in \mathbb {R}, \\&x_{1,k} :=\, x_k-\varvec{X}^{\tiny \mathrm{T}}_k\varvec{\alpha }\in \mathbb {R}. \end{aligned}$$

From (2), we can see that the ExpAR model is decomposed into these two Sub-ID models:

$$\begin{aligned}&S_1: x_k = \varvec{\phi }^{\tiny \mathrm{T}}(\xi ,k)\varvec{\varTheta }+\varepsilon _k, \end{aligned}$$
(4)
$$\begin{aligned}&S_2: x_{1,k} = \psi (\varvec{\beta })\mathrm{e}^{-\xi x^2_{k-1}}+\varepsilon _k. \end{aligned}$$
(5)

The parameter sets \(\varvec{\varTheta }\) and \(\xi \) in Sub-ID models (4) and (5) contain all the parameters to be estimated. The parameter \(\xi \) in \(\varvec{\phi }(\xi ,k)\) and the parameter vector \(\varvec{\beta }\) in \(\psi (\varvec{\beta })\) are the associate terms between these two Sub-ID models. Decomposing the identification model in (2) or (3) into the above fictitious subsystems, we can obtain a hierarchical structure which is demonstrated in Fig. 1.

Define two criterion functions

$$\begin{aligned}&J_1(\varvec{\varTheta }) :=\, \frac{1}{2}[x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\xi ,k)\varvec{\varTheta }]^2, \end{aligned}$$
(6)
$$\begin{aligned}&J_2(\xi ) :=\, \frac{1}{2}[x_{1,k}-\psi (\varvec{\beta })\mathrm{e}^{-\xi x^2_{k-1}}]^2. \end{aligned}$$
(7)

Computing the gradients of \(J_1(\varvec{\varTheta })\) and \(J_2(\xi )\), we have

$$\begin{aligned} \mathrm{grad}[J_1(\varvec{\varTheta })]= & {} \frac{\partial J_1(\varvec{\varTheta })}{\partial \varvec{\varTheta }} =-\varvec{\phi }(\xi ,k)[x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\xi ,k)\varvec{\varTheta }],\\ \mathrm{grad}[J_2(\xi )]= & {} \frac{\partial J_2(\xi )}{\partial \xi }\\= & {} x^2_{k-1}\psi (\varvec{\beta })\mathrm{e}^{-\xi x^2_{k-1}} {[}x_{1,k}-\psi (\varvec{\beta })\mathrm{e}^{-\xi x^2_{k-1}}] \\= & {} x^2_{k-1}\psi (\varvec{\beta })\mathrm{e}^{-\xi x^2_{k-1}} {[}x_k-\varvec{X}^{\tiny \mathrm{T}}_k\varvec{\alpha }\\&-\psi (\varvec{\beta })\mathrm{e}^{-\xi x^2_{k-1}}] \\= & {} -\varvec{\varTheta }^{\tiny \mathrm{T}}\varvec{\phi }'(\xi ,k) {[}x_k-\varvec{X}^{\tiny \mathrm{T}}_k\varvec{\alpha }-\psi (\varvec{\beta })\mathrm{e}^{-\xi x^2_{k-1}}] \\= & {} -\varvec{\varTheta }^{\tiny \mathrm{T}}\varvec{\phi }'(\xi ,k)[x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\xi ,k)\varvec{\varTheta }], \end{aligned}$$

where

$$\begin{aligned} \varvec{\phi }'(\xi ,k):=\,&\frac{\partial \varvec{\phi }(\xi ,k)}{\partial \xi }\\ =&[\mathbf{0}^{\tiny \mathrm{T}}_n,-x^2_{k-1}\mathrm{e}^{-\xi x^2_{k-1}}\varvec{X}^{\tiny \mathrm{T}}_k]^{\tiny \mathrm{T}}\in \mathbb {R}^{2n}. \end{aligned}$$

Let \(\hat{\varvec{\varTheta }}_k\) and \(\hat{\xi }_{k}\) signify the estimates of \(\varvec{\varTheta }\) and \(\xi \) at time k, \(\mu _{1,k}\) and \(\mu _{2,k}\) represent the step-sizes to be given later. Employing the negative gradient search, we have:

$$\begin{aligned} \hat{\varvec{\varTheta }}_k= & {} \hat{\varvec{\varTheta }}_{k-1} -\mu _{1,k}\mathrm{grad}[J_1(\hat{\varvec{\varTheta }}_{k-1})]\nonumber \\= & {} \hat{\varvec{\varTheta }}_{k-1}+\mu _{1,k}\varvec{\phi }(\xi ,k)[x_k -\varvec{\phi }^{\tiny \mathrm{T}}(\xi ,k)\hat{\varvec{\varTheta }}_{k-1}], \end{aligned}$$
(8)
$$\begin{aligned} \hat{\xi }_{k}= & {} \hat{\xi }_{k-1}-\mu _{2,k} \mathrm{grad}[J_2(\hat{\xi }_{k-1})] \nonumber \\= & {} \hat{\xi }_{k-1}+\mu _{2,k}\varvec{\varTheta }^{\tiny \mathrm{T}}\varvec{\phi }'(\hat{\xi }_{k-1},k) {[}x_k{-}\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k)\varvec{\varTheta }].\nonumber \\ \end{aligned}$$
(9)

The following finds the optimal step-sizes \(\mu _{1,k}\) and \(\mu _{2,k}\). One method is to apply the one-dimensional search, that is, to solve the optimization problems

$$\begin{aligned}&\mathop {\min }_{\mu _{1,k}\ge 0}J_1\{\hat{\varvec{\varTheta }}_{k-1}- \mu _{1,k}\mathrm{grad}[J_1(\hat{\varvec{\varTheta }}_{k-1})]\},\\&\mathop {\min }_{\mu _{2,k}\ge 0}J_2\{\hat{\xi }_{k-1}- \mu _{2,k}\mathrm{grad}[J_2(\hat{\xi }_{k-1})]\}. \end{aligned}$$

Remark 1

The one-dimensional search method is a fundamental method of finding the optimal step-size in the minimization problem. The key idea is to determine the negative gradient direction (i.e., the direction where the criterion function descends fastest) and to compute the step-size, which makes the criterion function minimal, by the one-dimensional search of the negative gradient direction.

For the sake of convenience, we define the innovations \(e_{1,k}\) and \(e_{2,k}\) as

$$\begin{aligned} e_{1,k} :=\,&x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\xi ,k)\hat{\varvec{\varTheta }}_{k-1}\in \mathbb {R}, \end{aligned}$$
(10)
$$\begin{aligned} e_{2,k} :=\,&x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k)\varvec{\varTheta }\in \mathbb {R}. \end{aligned}$$
(11)

Substituting \(\varvec{\varTheta }=\hat{\varvec{\varTheta }}_k\) into (6) gives

$$\begin{aligned} g_1[\mu _{1,k}] :=\,&J_1[\hat{\varvec{\varTheta }}_k]=\frac{1}{2}[x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\xi ,k)\hat{\varvec{\varTheta }}_k]^2 \\ =\,&\frac{1}{2}\{x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\xi ,k)[\hat{\varvec{\varTheta }}_{k-1}+\mu _{1,k}\varvec{\phi }(\xi ,k)e_{1,k}]\}^2 \\ =\,&\frac{1}{2}\{x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\xi ,k)\hat{\varvec{\varTheta }}_{k-1}-\mu _{1,k}\Vert \varvec{\phi }(\xi ,k)\Vert ^2e_{1,k}\}^2 \\ =\,&\frac{1}{2}\{e_{1,k}-\mu _{1,k}\Vert \varvec{\phi }(\xi ,k)\Vert ^2e_{1,k}\}^2 \\ =\,&\frac{1}{2}\{1-\mu _{1,k}\Vert \varvec{\phi }(\xi ,k)\Vert ^2\}^2e_{1,k}^2. \end{aligned}$$

In order to make \(J_1[\hat{\varvec{\varTheta }}_k]\) minimum, we take the optimal step-size \(\mu _{1,k}\) as

$$\begin{aligned} \mu _{1,k}=\frac{1}{\Vert \varvec{\phi }(\xi ,k)\Vert ^2}. \end{aligned}$$
(12)

To avoid the denominator being zero, the above equation can be modified to

$$\begin{aligned} \mu _{1,k}=\frac{1}{1+\Vert \varvec{\phi }(\xi ,k)\Vert ^2}. \end{aligned}$$
(13)

Substituting (12) or (13) into (8), we obtain the gain vector \(\frac{\varvec{\varvec{\phi }}(\xi ,k)}{\Vert \varvec{\varvec{\phi }}(\xi ,k)\Vert ^2}\) or \(\frac{\varvec{\varvec{\phi }}(\xi ,k)}{1+\Vert \varvec{\varvec{\phi }}(\xi ,k)\Vert ^2}\). Neither of these two gain vectors approaches zero with increasing k. From (8), we can see that when \(\hat{\varvec{\varTheta }}_{k-1}\) is close to \(\varvec{\varTheta }\), the large gain vector \(\mu _{1,k}\varvec{\phi }(\xi ,k)\) will make \(\hat{\varvec{\varTheta }}_k\) deviate from \(\varvec{\varTheta }\). To address this problem, we let the step-size \(\mu _{1,k}\) tend to zero with increasing k. Therefore, \(\mu _{1,k}\) is taken as

$$\begin{aligned} \mu _{1,k} :=\,&\frac{1}{r_{1,k}}, \nonumber \\ r_{1,k} =\,&r_{1,k-1}+\Vert \varvec{\phi }(\xi ,k)\Vert ^2. \end{aligned}$$
(14)

Similarly, substituting \(\xi =\hat{\xi }_{k}\) into (7) gives

$$\begin{aligned} g_2[\mu _{2,k}] :=\,&J_2[\hat{\xi }_{k}]=\frac{1}{2}[x_{1,k} -\psi (\varvec{\beta })\mathrm{e}^{-\hat{\xi }_{k} x^2_{k-1}}]^2 \\ =\,&\frac{1}{2}[x_k-\varvec{X}^{\tiny \mathrm{T}}_k\varvec{\alpha }-\psi (\varvec{\beta }) \mathrm{e}^{-\hat{\xi }_{k} x^2_{k-1}}]^2 \\ =\,&\frac{1}{2}[x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k},k)\varvec{\varTheta }]^2. \end{aligned}$$

Plugging the first-order Taylor expansion of \(\varvec{\phi }(\xi ,k)\) at \(\xi =\hat{\xi }_{k-1}\) into the above equation, we have

$$\begin{aligned} g_2[\mu _{2,k}]&= \frac{1}{2}\{x_k-[\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k)\\&\quad +\,[\varvec{\phi }'(\hat{\xi }_{k-1},k)]^{\tiny \mathrm{T}}(\hat{\xi }_{k}-\hat{\xi }_{k-1})\\&\quad +\, o(\hat{\xi }_{k}-\hat{\xi }_{k-1})]\varvec{\varTheta }\}^2 \\&= \frac{1}{2}\{x_k-[\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k)\\&\quad +\,[\varvec{\phi }'(\hat{\xi }_{k-1},k)]^{\tiny \mathrm{T}}[\mu _{2,k}\varvec{\varTheta }^{\tiny \mathrm{T}} \varvec{\phi }'(\hat{\xi }_{k-1},k)e_{2,k}]\\&\quad +\,o(\hat{\xi }_{k} -\hat{\xi }_{k-1})]\varvec{\varTheta }\}^2 \\&= \frac{1}{2}[x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k) \varvec{\varTheta }\\&\quad -[\varvec{\phi }'(\hat{\xi }_{k-1},k)]^{\tiny \mathrm{T}} {[}\mu _{2,k}\varvec{\varTheta }^{\tiny \mathrm{T}}\varvec{\phi }'(\hat{\xi }_{k-1},k) e_{2,k}]\varvec{\varTheta }\\&\quad +\,o(\hat{\xi }_{k}-\hat{\xi }_{k-1})]^2 \\&= \frac{1}{2}[e_{2,k}-\mu _{2,k}\Vert \varvec{\varTheta }^{\tiny \mathrm{T}} \varvec{\phi }'(\hat{\xi }_{k-1},k)\Vert ^2e_{2,k}\\&\quad +\,o(\hat{\xi }_{k}-\hat{\xi }_{k-1})]^2 \\&= \frac{1}{2}[1-\mu _{2,k}\Vert \varvec{\varTheta }^{\tiny \mathrm{T}} \varvec{\phi }'(\hat{\xi }_{k-1},k)\Vert ^2]^2e_{2,k}^2\\&\quad +\,o(\hat{\xi }_{k}-\hat{\xi }_{k-1})^2. \end{aligned}$$

The optimal \(\mu _{2,k}\) can be obtained by minimizing \(g_2[\mu _{2,k}]\), i.e., by solving the equation

$$\begin{aligned} 1-\mu _{2,k}\Vert \varvec{\varTheta }^{\tiny \mathrm{T}}\varvec{\phi }'(\hat{\xi }_{k-1},k)\Vert ^2=0. \end{aligned}$$

Thus, the step-size \(\mu _{2,k}\) can be chosen as

$$\begin{aligned} \mu _{2,k}=\frac{1}{\Vert \varvec{\varTheta }^{\tiny \mathrm{T}}\varvec{\phi }'(\hat{\xi }_{k-1},k)\Vert ^2}. \end{aligned}$$

Similarly, considering the stability of the identification algorithm, the above equation can be modified to

$$\begin{aligned}&\mu _{2,k}:=\,\frac{1}{r_{2,k}}, \nonumber \\&r_{2,k}=\,r_{2,k-1}+\Vert \varvec{\varTheta }^{\tiny \mathrm{T}}\varvec{\phi }'(\hat{\xi }_{k-1},k)\Vert ^2. \end{aligned}$$
(15)

Plugging (10), (14) into (8), and (11), (15) into (9), we obtain the following recursive relations:

$$\begin{aligned}&\hat{\varvec{\varTheta }}_k = \hat{\varvec{\varTheta }}_{k-1} +\frac{1}{r_{1,k}}\varvec{\phi }(\xi ,k)e_{1,k}, \end{aligned}$$
(16)
$$\begin{aligned}&e_{1,k} = x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\xi ,k)\hat{\varvec{\varTheta }}_{k-1}, \end{aligned}$$
(17)
$$\begin{aligned}&r_{1,k} = r_{1,k-1}+\Vert \varvec{\phi }(\xi ,k)\Vert ^2, \end{aligned}$$
(18)
$$\begin{aligned}&\hat{\xi }_{k} = \hat{\xi }_{k-1}+\frac{1}{r_{2,k}} \varvec{\varTheta }^{\tiny \mathrm{T}}\varvec{\phi }'(\hat{\xi }_{k-1},k)e_{2,k}, \end{aligned}$$
(19)
$$\begin{aligned}&e_{2,k} = x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k)\varvec{\varTheta }, \end{aligned}$$
(20)
$$\begin{aligned}&r_{2,k} = r_{2,k-1}+\Vert \varvec{\varTheta }^{\tiny \mathrm{T}}\varvec{\phi }'(\hat{\xi }_{k-1},k)\Vert ^2. \end{aligned}$$
(21)

Here, a difficulty arises. Since the parameter sets \(\varvec{\varTheta }\) and \(\xi \), existing in the right-hand sides of (16)–(21), are to be estimated later, the algorithm in (16)–(21) cannot be realized. Inspired by the hierarchical identification principle, we replace the unknown parameters \(\xi \) in (16)–(18) and \(\varvec{\varTheta }\) in (19)–(21) with the estimates \(\hat{\xi }_{k-1}\) and \(\hat{\varvec{\varTheta }}_k\). It follows that

$$\begin{aligned}&\hat{\varvec{\varTheta }}_k = \hat{\varvec{\varTheta }}_{k-1} +\frac{1}{r_{1,k}}\varvec{\phi }(\hat{\xi }_{k-1},k)e_{1,k}, \end{aligned}$$
(22)
$$\begin{aligned}&e_{1,k} = x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k)\hat{\varvec{\varTheta }}_{k-1}, \end{aligned}$$
(23)
$$\begin{aligned}&r_{1,k} = r_{1,k-1}+\Vert \varvec{\phi }(\hat{\xi }_{k-1},k)\Vert ^2, \end{aligned}$$
(24)
$$\begin{aligned}&\varvec{\phi }(\hat{\xi }_{k-1},k) = [\varvec{X}^{\tiny \mathrm{T}}_k,\mathrm{e}^{-\hat{\xi }_{k-1} x^2_{k-1}}\varvec{X}^{\tiny \mathrm{T}}_k]^{\tiny \mathrm{T}}, \end{aligned}$$
(25)
$$\begin{aligned}&\varvec{X}_k = [x_{k-1},x_{k-2},\ldots ,x_{k-n}]^{\tiny \mathrm{T}}, \end{aligned}$$
(26)
$$\begin{aligned}&\hat{\varvec{\varTheta }}_k = [\hat{\varvec{\alpha }}^{\tiny \mathrm{T}}_k,\hat{\varvec{\beta }}^{\tiny \mathrm{T}}_k]^{\tiny \mathrm{T}}, \end{aligned}$$
(27)
$$\begin{aligned}&\hat{\xi }_{k} = \hat{\xi }_{k-1}+\frac{1}{r_{2,k}}\hat{\varvec{\varTheta }}^{\tiny \mathrm{T}}_k \varvec{\phi }'(\hat{\xi }_{k-1},k)e_{2,k}, \end{aligned}$$
(28)
$$\begin{aligned}&e_{2,k} = x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k)\hat{\varvec{\varTheta }}_k, \end{aligned}$$
(29)
$$\begin{aligned}&r_{2,k} = r_{2,k-1}+\Vert \hat{\varvec{\varTheta }}^{\tiny \mathrm{T}}_k \varvec{\phi }'(\hat{\xi }_{k-1},k)\Vert ^2, \end{aligned}$$
(30)
$$\begin{aligned}&\varvec{\phi }'(\hat{\xi }_{k-1},k) = [\mathbf{0}_n^{\tiny \mathrm{T}},-x^2_{k-1} \mathrm{e}^{-\hat{\xi }_{k-1} x^2_{k-1}}\varvec{X}^{\tiny \mathrm{T}}_k]^{\tiny \mathrm{T}}. \end{aligned}$$
(31)

The above computational process forms the H-SG algorithm for the ExpAR model.

The process of computing \(\hat{\varvec{\varTheta }}_k\) and \(\hat{\xi }_{k}\) by the H-SG algorithm is exhibited in the following list.

  1. 1.

    To initialize, let \(k=1\), \(\hat{\varvec{\varTheta }}_0= {[}\hat{\varvec{\alpha }}^{\tiny \mathrm{T}}_0,\hat{\varvec{\beta }}^{\tiny \mathrm{T}}_0]^{\tiny \mathrm{T}}=\mathbf{1}_{2n}/p_0\), \(\hat{\xi }_0=1/p_0\), \(p_0=10^6\), \(r_{1,0}=1\) and \(r_{2,0}=1\), give an error tolerance \(\eta >0\).

  2. 2.

    Collect the measurement data \(x_k\), form the information vectors \(\varvec{X}_k\) and \(\varvec{\phi }(\hat{\xi }_{k-1},k)\) by (26) and (25).

  3. 3.

    Compute the reciprocal of the step-size \(r_{1,k}\) by (24) and the innovation \(e_{1,k}\) by (23).

  4. 4.

    Update the parameter estimation vector \(\hat{\varvec{\varTheta }}_k\) by (22), and read out \(\hat{\varvec{\alpha }}_k\) and \(\hat{\varvec{\beta }}_k\) from \(\hat{\varvec{\varTheta }}_k\) in (27).

  5. 5.

    Form the derivative of \(\varvec{\phi }(\hat{\xi }_{k-1},k)\) with respect to \(\hat{\xi }_{k-1}\) by (31).

  6. 6.

    Compute the reciprocal of the step-size \(r_{2,k}\) by (30) and the innovation \(e_{2,k}\) by (29).

  7. 7.

    Update the parameter estimate \(\hat{\xi }_{k}\) by (28).

  8. 8.

    Compare \(\{\hat{\varvec{\varTheta }}_k,\hat{\xi }_{k}\}\) with \(\{\hat{\varvec{\varTheta }}_{k-1},\hat{\xi }_{k-1}\}\): if \(\Vert \hat{\varvec{\varTheta }}_k-\hat{\varvec{\varTheta }}_{k-1}\Vert +\Vert \hat{\xi }_{k}-\hat{\xi }_{k-1}\Vert >\eta \), increase k by 1 and return to Step 2; otherwise, terminate this computational process.

The H-SG algorithm in (22)–(31) estimates the parameter sets \(\varvec{\varTheta }\) and \(\xi \) in an interactive way. The innovations \(e_{1,k}\) and \(e_{2,k}\) in (23) and (29) are scalars. In order to make the most of the information, we derive an interactive multi-innovation parameter estimation method in the next section.

4 The hierarchical multi-innovation stochastic gradient algorithm

The innovation is the useful information which can improve the parameter and state estimation accuracy. The multi-innovation identification is the innovation expansion based identification [21]. Applying the multi-innovation identification theory, we expand the scalar innovations \(e_{1,k}\) and \(e_{2,k}\) in (23) and (29), and develop an H-MISG algorithm for the ExpAR model in this section.

Let l denote the innovation length. Expand the scalar innovations in (23) and (29) into the l-dimensional vectors:

$$\begin{aligned} \varvec{E}_1(l) :=\,&\left[ \begin{array}{c} x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k) \hat{\varvec{\varTheta }}_{k-1}\\ x_{k-1}-\varvec{\phi }^{\tiny \mathrm{T}} (\hat{\xi }_{k-1},k-1)\hat{\varvec{\varTheta }}_{k-1}\\ \vdots \\ x_{k-l+1}-\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k-l+1) \hat{\varvec{\varTheta }}_{k-1}\\ \end{array}\right] \in \mathbb {R}^l, \\ \varvec{E}_2(l) :=\,&\left[ \begin{array}{c} x_k-\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k)\hat{\varvec{\varTheta }}_k\\ x_{k-1}-\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k-1)\hat{\varvec{\varTheta }}_k\\ \vdots \\ x_{k-l+1}-\varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k-l+1)\hat{\varvec{\varTheta }}_k\\ \end{array}\right] \in \mathbb {R}^l. \end{aligned}$$

Define the following stacked vector and matrix:

$$\begin{aligned} \varvec{X}(l) :=\,&\left[ \begin{array}{c} x_k\\ x_{k-1}\\ \vdots \\ x_{k-l+1}\\ \end{array}\right] \in \mathbb {R}^l, \\ \varvec{\varPhi }(l,\hat{\xi }_{k-1}) :=\,&\left[ \begin{array}{c} \varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k)\\ \varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k-1)\\ \vdots \\ \varvec{\phi }^{\tiny \mathrm{T}}(\hat{\xi }_{k-1},k-l+1)\\ \end{array}\right] ^{\tiny \mathrm{T}} \in \mathbb {R}^{(2n)\times l}. \end{aligned}$$

Then, the innovation vectors can be equivalently transformed into

$$\begin{aligned} \varvec{E}_1(l)= & {} \varvec{X}(l)-\varvec{\varPhi }^{\tiny \mathrm{T}}(l,\hat{\xi }_{k-1})\hat{\varvec{\varTheta }}_{k-1},\\ \varvec{E}_2(l)= & {} \varvec{X}(l)-\varvec{\varPhi }^{\tiny \mathrm{T}}(l,\hat{\xi }_{k-1})\hat{\varvec{\varTheta }}_k. \end{aligned}$$

Since \(\varvec{E}_1(l)=e_{1,k}\), \(\varvec{\varPhi }(l,\hat{\xi }_{k-1})=\varvec{\phi }(\hat{\xi }_{k-1},k)\) and \(\varvec{X}(l)=x_k\) for \(l=1\), Eq. (22) can be written as

$$\begin{aligned} \hat{\varvec{\varTheta }}_k=\hat{\varvec{\varTheta }}_{k-1} +\frac{1}{r_{1,k}}\varvec{\varPhi }(l,\hat{\xi }_{k-1})\varvec{E}_1(l). \end{aligned}$$

Similarly, Eq. (28) can be transformed into

$$\begin{aligned} \hat{\xi }_{k}=\hat{\xi }_{k-1}+\frac{1}{r_{2,k}} \hat{\varvec{\varTheta }}^{\tiny \mathrm{T}}_k\varvec{\varPhi }'(l,\hat{\xi }_{k-1})\varvec{E}_2(l), \end{aligned}$$

where

$$\begin{aligned} \varvec{\varPhi }'(l,\hat{\xi }_{k-1}) :=\,&[\varvec{\phi }'(\hat{\xi }_{k-1},k), \varvec{\phi }'(\hat{\xi }_{k-1},k-1),\\&\ldots ,\varvec{\phi }'(\hat{\xi }_{k-1},k-l+1)] \in \mathbb {R}^{(2n)\times l}. \end{aligned}$$

In summary, the H-MISG algorithm for the ExpAR model can be derived as follows:

$$\begin{aligned}&\hat{\varvec{\varTheta }}_k = \hat{\varvec{\varTheta }}_{k-1}+\frac{1}{r_{1,k}} \varvec{\varPhi }(l,\hat{\xi }_{k-1})\varvec{E}_1(l), \end{aligned}$$
(32)
$$\begin{aligned}&\varvec{E}_1(l)= \varvec{X}(l)-\varvec{\varPhi }^{\tiny \mathrm{T}}(l,\hat{\xi }_{k-1})\hat{\varvec{\varTheta }}_{k-1}, \end{aligned}$$
(33)
$$\begin{aligned}&r_{1,k} = r_{1,k-1}+\Vert \varvec{\phi }(\hat{\xi }_{k-1},k)\Vert ^2, \end{aligned}$$
(34)
$$\begin{aligned}&\varvec{X}(l) = [x_{k-1},x_{k-2},\ldots ,x_{k-l+1}]^{\tiny \mathrm{T}}, \end{aligned}$$
(35)
$$\begin{aligned}&\varvec{\varPhi }(l,\hat{\xi }_{k-1}) = [\varvec{\phi }(\hat{\xi }_{k-1},k), \varvec{\phi }(\hat{\xi }_{k-1},k-1),\nonumber \\&\quad \ldots ,\varvec{\phi }(\hat{\xi }_{k-1},k-l+1)], \end{aligned}$$
(36)
$$\begin{aligned}&\varvec{\phi }(\hat{\xi }_{k-1},k) = [\varvec{X}^{\tiny \mathrm{T}}_k, \mathrm{e}^{-\hat{\xi }_{k-1} x^2_{k-1}}\varvec{X}^{\tiny \mathrm{T}}_k]^{\tiny \mathrm{T}}, \end{aligned}$$
(37)
$$\begin{aligned}&\varvec{X}_k = [x(k-1),x(k-2),\ldots ,x(k-n)]^{\tiny \mathrm{T}}, \end{aligned}$$
(38)
$$\begin{aligned}&\hat{\varvec{\varTheta }}_k = [\hat{\varvec{\alpha }}^{\tiny \mathrm{T}}_k,\hat{\varvec{\beta }}^{\tiny \mathrm{T}}_k]^{\tiny \mathrm{T}}, \end{aligned}$$
(39)
$$\begin{aligned}&\hat{\xi }_{k} = \hat{\xi }_{k-1}+\frac{1}{r_{2,k}}\hat{\varvec{\varTheta }}^{\tiny \mathrm{T}}_k \varvec{\varPhi }'(l,\hat{\xi }_{k-1})\varvec{E}_2(l), \end{aligned}$$
(40)
$$\begin{aligned}&\varvec{E}_2(l)= \varvec{X}(l)-\varvec{\varPhi }^{\tiny \mathrm{T}}(l,\hat{\xi }_{k-1})\hat{\varvec{\varTheta }}_k, \end{aligned}$$
(41)
$$\begin{aligned}&r_{2,k} = r_{2,k-1}+\Vert \hat{\varvec{\varTheta }}^{\tiny \mathrm{T}}_k\varvec{\phi }'(\hat{\xi }_{k-1},k)\Vert ^2, \end{aligned}$$
(42)
$$\begin{aligned}&\varvec{\phi }'(\hat{\xi }_{k-1},k) = [\mathbf{0}_n^{\tiny \mathrm{T}},-x^2_{k-1} \mathrm{e}^{-\hat{\xi }_{k-1} x^2_{k-1}}\varvec{X}^{\tiny \mathrm{T}}_k]^{\tiny \mathrm{T}}, \end{aligned}$$
(43)
$$\begin{aligned}&\varvec{\varPhi }'(l,\hat{\xi }_{k-1}) = [\varvec{\phi }'(\hat{\xi }_{k-1},k), \varvec{\phi }'(\hat{\xi }_{k-1},k-1),\nonumber \\&\quad \ldots ,\varvec{\phi }'(\hat{\xi }_{k-1},k-l+1)]. \end{aligned}$$
(44)

When \(l=1\), the H-MISG degenerates into the H-SG algorithm.

The H-MISG algorithm in (32)–(44) can be implemented by the following steps.

  1. 1.

    Set the innovation length l and initialize: let \(k=1\), \(\hat{\varvec{\varTheta }}_0=[\hat{\varvec{\alpha }}^{\tiny \mathrm{T}}_0,\hat{\varvec{\beta }}^{\tiny \mathrm{T}}_0]^{\tiny \mathrm{T}} =\mathbf{1}_{2n}/p_0\), \(\hat{\xi }_0=1/p_0\), \(p_0=10^6\), \(r_{1,0}=1\) and \(r_{2,0}=1\), give an error tolerance \(\eta >0\).

  2. 2.

    Collect the measurement data \(x_k\), form the stacked information vector \(\varvec{X}(l)\) by (35), the information vectors \(\varvec{X}_k\) and \(\varvec{\phi }(\hat{\xi }_{k-1},k)\) by (38) and (37), and \(\varvec{\varPhi }(l,\hat{\xi }_{k-1})\) by (36).

  3. 3.

    Compute the reciprocal of the step-size \(r_{1,k}\) by (34) and the innovation vector \(\varvec{E}_1(l)\) by (33).

  4. 4.

    Update the parameter estimation vector \(\hat{\varvec{\varTheta }}_k\) by (32), and read out \(\hat{\varvec{\alpha }}_k\) and \(\hat{\varvec{\beta }}_k\) from (39).

  5. 5.

    Form the derivative of \(\varvec{\phi }(\hat{\xi }_{k-1},k)\) by (43), and \(\varvec{\varPhi }'(l,\hat{\xi }_{k-1})\) by (44).

  6. 6.

    Compute the reciprocal of the step-size \(r_{2,k}\) by (42) and the innovation vector \(\varvec{E}_2(l)\) by (41).

  7. 7.

    Update the parameter estimate \(\hat{\xi }_{k}\) by (40).

  8. 8.

    Compare \(\{\hat{\varvec{\varTheta }}_k,\hat{\xi }_{k}\}\) with \(\{\hat{\varvec{\varTheta }}_{k-1},\hat{\xi }_{k-1}\}\): if \(\Vert \hat{\varvec{\varTheta }}_k-\hat{\varvec{\varTheta }}_{k-1}\Vert +\Vert \hat{\xi }_{k}-\hat{\xi }_{k-1}\Vert >\eta \), increase k by 1 and return to Step 2; otherwise, stop this computational process.

Remark 2

In order to obtain more accurate parameter estimates but not increase the computational cost of the H-MISG algorithm, we introduce the forgetting factors (FF) \(\lambda _1\) and \(\lambda _2\) into (34) and (42):

$$\begin{aligned} r_{1,k}= & {} \lambda _1r_{1,k-1}+\Vert \varvec{\phi }(\hat{\xi }_{k-1},k)\Vert ^2, \quad 0\le \lambda _1<1, \end{aligned}$$
(45)
$$\begin{aligned} r_{2,k}= & {} \lambda _2r_{2,k-1}+\Vert \hat{\varvec{\varTheta }}^{\tiny \mathrm{T}}_k \varvec{\phi }'(\hat{\xi }_{k-1},k)\Vert ^2, \quad 0\le \lambda _2<1.\nonumber \\ \end{aligned}$$
(46)

Replacing (34) and (42) in the H-MISG algorithm with (45) and (46), we obtain the variant of the H-MISG, i.e., the FF-H-MISG algorithm for the ExpAR model. When \(\lambda _1=1\) and \(\lambda _2=1\), the FF-H-MISG degenerates into the H-MISG algorithm.

Table 2 The H-SG estimates and errors (\(\sigma ^2=0.20^2\))
Table 3 The H-MISG estimates and errors (\(\sigma ^2=0.20^2\), \(l=5\))

Remark 3

Before using the proposed algorithms to identify the ExpAR model, we need to determine the order from input-output data by using the order estimation methods, such as the orthogonalization procedure and the correlation analysis in [22].

At each recursion, the H-SG algorithm involves the current measurement data and innovation, the H-MISG or the FF-H-MISG algorithm applies all the current and the preceding \((l-1)\) measurement data and innovations, which makes the latter has a higher parameter estimation accuracy.

5 Example

Consider the following ExpAR time series

$$\begin{aligned} x_k= & {} \left( \alpha _1+\beta _1\mathrm{e}^{-\xi x^2_{k-1}}\right) x_{k-1} +\left( \alpha _2+\beta _2\mathrm{e}^{-\xi x^2_{k-1}}\right) x_{k-2}\\&+\cdots +\left( \alpha _n+\beta _n\mathrm{e}^{-\xi x^2_{k-1}}\right) x_{k-n}+\varepsilon _k \\= & {} \left( 1.25+2.00\mathrm{e}^{-2.30 x^2_{k-1}}\right) x_{k-1}\\&+\left( -0.28+1.85\mathrm{e}^{-2.30 x^2_{k-1}}\right) x_{k-2} +\varepsilon _k. \end{aligned}$$

The parameters to be identified are

$$\begin{aligned} \varvec{\varTheta }= & {} [\alpha _1,\alpha _2,\beta _1,\beta _2]^{\tiny \mathrm{T}}\\= & {} [1.25,-0.28,2.00,1.85]^{\tiny \mathrm{T}}, \quad \xi =2.30. \end{aligned}$$

In simulation, the variance of the white noise \(\{\varepsilon _k\}\) is set to be \(\sigma ^2\), the measurement data length is taken as \(L_e=3000\). For simplicity, we define \(\varvec{\vartheta }:=[\varvec{\varTheta }^{\tiny \mathrm{T}},\xi ]^{\tiny \mathrm{T}}\).

Table 4 The FF-H-MISG estimates and errors (\(\sigma ^2=0.20^2\), \(l=5\), \(\lambda _1=0.91\), \(\lambda _2=1.00\))
Fig. 2
figure 2

The H-SG, H-MISG and FF-H-MISG estimation errors \(\delta \) versus k

Table 5 The FF-H-MISG estimates and errors (\(\sigma ^2=0.20^2\), \(\lambda _1=0.91\), \(\lambda _2=1.00\))

Taking \(\sigma ^2=0.20^2\) and using the H-SG algorithm, H-MISG algorithm with \(l=5\) and FF-H-MISG algorithm with \(l=5\), \(\lambda _1=0.91\) and \(\lambda _2=1.00\) to identify this ExpAR model, respectively, the parameter estimates and their errors are shown in Tables 2, 3 and 4, the parameter estimation errors \(\delta :=\Vert \hat{\varvec{\vartheta }}_k-\varvec{\vartheta }\Vert /\Vert \varvec{\vartheta }\Vert \times 100\%\) versus k are shown in Figure 2.

To illustrate the advantage of the proposed multi-innovation identification algorithms, we fix the noise variance \(\sigma ^2=0.20^2\), the forgetting factors \(\lambda _1=0.91\) and \(\lambda _2=1.00\), and adopt the FF-H-MISG algorithm to identify this ExpAR model with the innovation length \(l=5\), \(l=6\) and \(l=7\). The corresponding results are demonstrated in Table 5 and Fig. 3.

To demonstrate how the performance of the proposed FF-H-MISG algorithm depends on the forgetting factors, we fix the noise variance \(\sigma ^2=0.20^2\), the innovation length \(l=7\), the forgetting factor \(\lambda _2=1.00\), and adopt the FF-H-MISG algorithm to identify this ExpAR model with the forgetting factor \(\lambda _1=0.91\), \(\lambda _1=0.97\) and \(\lambda _1=0.99\). The corresponding results are exhibited in Table 6 and Fig. 4.

To show the influence of the noise level on the proposed FF-H-MISG algorithm, we fix the innovation length \(l=7\), the forgetting factors \(\lambda _1=0.91\), \(\lambda _2=1.00\), and adopt the FF-H-MISG algorithm to identify this ExpAR model with the noise variance \(\sigma ^2=0.20^2\), \(\sigma ^2=0.23^2\) and \(\sigma ^2=0.26^2\). The results are shown in Table 7 and Fig. 5.

Fig. 3
figure 3

The FF-H-MISG estimation errors \(\delta \) versus k (\(\sigma ^2=0.20^2\), \(\lambda _1=0.91\), \(\lambda _2=1.00\))

From Tables 2, 3, 4, 5, 6 and 7 and Figs. 2, 3, 4 and 5, we draw the following conclusions.

  • The parameter estimation errors decrease as the data length k increases for all the algorithms proposed in this paper. The FF-H-MISG algorithm has the highest parameter estimation accuracy among these three algorithms—see Tables 2, 3, 4 and Fig. 2.

  • The parameter estimation accuracy becomes higher with the innovation length l increasing and the forgetting factor decreasing for the FF-H-MISG algorithm—see Tables 5, 6 and Figs. 3, 4.

  • The estimation errors of the FF-H-MISG algorithm tend to zero with the decreasing of noise levels—see Table 7 and Fig. 5.

  • The proposed FF-H-MISG algorithm with appropriate innovation length and forgetting factors is effective to identify the nonlinear ExpAR process—see Tables 5, 6 and Figs. 3, 4.

For the model validation, we use \(L_r=200\) observations from \(k=L_e+1\) to \(k=L_e+L_r\) and the predicted model by the FF-H-MISG algorithm with \(\lambda _1=0.91\), \(\lambda _2=1.00\) and \(l=7\). The predicted data \(\hat{x}_k\) and the measurement data \(x_k\) are plotted in Fig. 6. To evaluate the prediction performance, we define and compute the mean square error (MSE)

$$\begin{aligned} \mathrm{MSE}:=\left[ \frac{1}{L_r}\sum \limits _{k=L_e+1}^{L_e+L_r} (\hat{x}_k-x_k)^2\right] ^{1/2}=0.19635. \end{aligned}$$

From Fig. 6, we can see that the predicted data is close to the measurement data, which means the predicted model can reveal the dynamics of this ExpAR process.

Table 6 The FF-H-MISG estimates and errors (\(\sigma ^2=0.20^2\), \(l=7\), \(\lambda _2=1.00\))
Fig. 4
figure 4

The FF-H-MISG estimation errors \(\delta \) versus k (\(\sigma ^2=0.20^2\), \(l=7\), \(\lambda _2=1.00\))

Table 7 The FF-H-MISG estimates and errors (\(l=7\), \(\lambda _1=0.91\), \(\lambda _2=1.00\))
Fig. 5
figure 5

The FF-H-MISG estimation errors \(\delta \) versus k (\(l=7\), \(\lambda _1=0.91\), \(\lambda _2=1.00\))

Fig. 6
figure 6

The predicted data \(\hat{x}_k\) and the measurement data \(x_k\) for the FF-H-MISG algorithm

6 Conclusions

Applying the hierarchical identification principle and the multi-innovation identification theory, this paper derives an H-SG algorithm and an H-MISG algorithm for the ExpAR model. For the sake of the improved estimation accuracy, two forgetting factors are introduced into the H-MISG, and a variant of the H-MISG, i.e., the FF-H-MISG algorithm is presented in this work. The simulation results demonstrate that the FF-H-MISG algorithm with appropriate innovation length and forgetting factors is effective to identify the ExpAR model. Jointing other methods [23] such as the neural network [24, 25] and the kernel collocation [26, 27], the algorithms proposed in this paper can be exploited to study parameter identification of different systems and can be applied to other fields [28,29,30,31].