1 Introduction

Fractional calculus has become ubiquitous in an increasing number of physical systems and industrial processes, such as viscous materials [1], fluid mechanics [2], and advanced materials [3]. Many complex physical and industrial systems are difficult to explain with traditional integer calculus, but the introduction of fractional calculus can largely overcome this difficulty. Therefore, fractional-order systems have absolute advantages in modeling, identification and control compared to integer order systems [4,5,6].

However, a known system model and parameters are prerequisites for a system to be accurately controlled. Therefore, before controlling the system, we first need to obtain the model structure and parameters of the system by means of system identification. Modeling and identification are particularly important for applications involving practical systems, such as batteries; hence, many scholars have explored and comprehensively summarized various modeling and identification methods for lithium-ion batteries [7, 8]. Actual systems often exhibit strong nonlinear characteristics, and block structure models such as the Hammerstein, Wiener, and Hammerstein-Wiener models connect the nonlinear dynamic part and the linear static part of a system to each other, so that the nonlinear characteristics of the system can be described.

Regarding the modeling and identification of fractional-order systems, a large number of researchers have addressed these topics [9,10,11,12]. Zhang Qian and other scholars studied the identification of fractional- order systems with colored noise, combined the multi-innovation principle with the Levenberg-Marquardt algorithm, and successfully applied their results to a fractional-order Hammerstein system [13]. Later, on the basis of this previous research, these researchers adopted the fractional-order Hammerstein method of separation identification, and used a neuro fuzzy model to fit the nonlinear part of the Hammerstein model, thus converting the identification of the nonlinear system into a completely linear problem [14]. Rahmani et al. combined the Lyapunov method with a linear optimization algorithm and applied it for the modeling and identification of a fractional-order Hammerstein model [15]. At the same time, other scholars have also combined intelligent optimization algorithms with traditional classical identification algorithms for the identification of fractional-order systems [16,17,18]. Hammar et al. applied the particle swarm optimization algorithm for the parameter identification of the state space model of a fractional-order Hammerstein system [19]. Sersour et al. then extended this particle swarm optimization algorithm to propose an adaptive velocity particle swarm optimization algorithm, which can successfully identify fractional-order discrete Wiener systems [20].

Most of the literature has focused on fractional- order Hammerstein and Wiener systems. Compared with a Hammerstein-Wiener system, these two types of systems have simpler structures, and it is more difficult to fully express the strong nonlinearity of more complex systems. Therefore, it is very important to explore an identification method suitable for fractional-order Hammerstein-Wiener systems.

Through a comprehensive analysis of the identification methods for the transfer functions of fractional-order systems in the above literature, the main problems encountered in system identification are identified as follows: 1) There are many parameters and variables to be identified, and they are coupled with each other. 2) The estimation of fractional orders is difficult. 3) The algorithms converge slowly. 4) The algorithms may not always successfully converge. Therefore, this paper proposes a hybrid parameter identification algorithm based on the multi-innovation principle. Previous scholars [21,22,23,24] have proposed a multi-innovation least-mean-square algorithm based on the principle of multi-innovation identification. The basic principle of the multi-innovation identification method is to expand the scalar innovation into a multi-innovation vector, and the innovation vector into an innovation matrix, allowing both the current data and past data to be used. In this paper, it is shown that this multi-innovation series identification algorithm can achieve improved convergence and improved accuracy of parameter estimation; therefore, this paper introduces the multi-innovation principle into the proposed identification algorithm. The main contributions of this paper are as follows: 1) A fractional-order discrete Hammerstein-Wiener system model is constructed. 2) A multi-innovation recursive gradient descent algorithm is designed to estimate the model parameters with an information vector composed of fractional-order variables as input, thereby adapting the Levenberg-Marquardt algorithm to estimate the fractional order. The modeling efficiency is improved by using two multi-innovation algorithms interactively. 3) Based on the convergence theorem, the performance of the proposed algorithm is analyzed.

The main structure of this paper is as follows: Section 2 introduces background knowledge on fractional calculus and describes the fractional-order Hammerstein-Wiener system model. Section 3 introduces the multi-innovation gradient descent algorithm used to estimate the parameters of fractional-order nonlinear models. Section 4 proposes the multi-innovation Levenberg-Marquardt algorithm to estimate the fractional orders of nonlinear systems. Section 5 applies the convergence theorem to analyze the performance of the proposed algorithm. Section 6 presents a simulation case study. Finally, a summary of this research and an outlook on future research are given.

2 Problem formulation and preliminary

2.1 Fractional calculus

From different perspectives, researchers have obtained several common forms of fractional calculus operators, including the Grünwald-Letnikov (GL) fractional operators [25], the Riemann-Liouville (RL) fractional operators [26] and the Caputo fractional operators [27]. The GL-type fractional operator for a discrete system that is used in this paper is defined as follows:

$$ {\Delta}^{\overline{\alpha}}x(kh)=\frac{1}{h^{\overline{\alpha}}}\sum \limits_{j=0}^k{\left(-1\right)}^j\left(\begin{array}{c}\overline{\alpha}\\ {}j\end{array}\right)x\left(\left(k-j\right)h\right) $$
(1)

where \( 0<\overline{\alpha}<1 \) is the fractional order; k and h represent the number of sampling times and the sampling time, respectively; and \( \left(\begin{array}{l}\overline{\alpha}\\ {}j\end{array}\right) \) is defined as follows:

$$ \left(\begin{array}{c}\overline{\alpha}\\ {}j\end{array}\right)=\Big\{{\displaystyle \begin{array}{cc}1&\ \mathrm{for}\ j=0\\ {}\frac{\overline{\alpha}\left(\overline{\alpha}-1\right)\cdots \left(\overline{\alpha}-j+1\right)}{j!}&\ \mathrm{for}\ j>0\end{array}} $$
(2)

This can be written in recursive form as:

$$ \Big\{{\displaystyle \begin{array}{c}\beta (0)=1\\ {}\beta (j)=\beta \left(j-1\right)\frac{\left(j-\overline{\alpha}-1\right)}{j}\ \mathrm{for}\ j=1,\dots, k\end{array}} $$
(3)

where \( \beta (j)={\left(-1\right)}^j\left(\begin{array}{c}\overline{\alpha}\\ {}j\end{array}\right) \). To facilitate simulation and concise expression, Eq. (1) can be written as follows according to Eqs. (2) and (3):

$$ {\Delta}^{\overline{\alpha}}x(kh)=\frac{1}{h^{\overline{\alpha}}}\sum \limits_{j=0}^k\beta (j)x\left(\left(k-j\right)h\right) $$
(4)

Under the assumption that the system sampling time is h = 1, Eq. (4) can be organized into the following equation:

$$ {\Delta}^{\overline{\alpha}}x(k)=\sum \limits_{j=0}^k\beta (j)x\left(k-j\right) $$
(5)

In this paper, the fractional calculus operator expressed in Eq. (5) will be used.

2.2 Fractional-order linear model

For fractional-order systems, there are different linear model descriptions. This paper considers the following linear discrete transfer function model:

$$ y(k)=G(z)u(k)=\frac{B(z)}{A(z)}u(k) $$
(6)

where u(k) and y(k) are the system input and output, respectively. A(z) and B(z) are the denominator and numerator polynomials of the transfer function, \( A(z)={a}_1{z}^{-{\alpha}_1}+{a}_2{z}^{-{\alpha}_2}+\cdots +{a}_{n_a}{z}^{-{\alpha}_{n_a}} \) and \( B(z)={b}_1{z}^{-{\gamma}_1}+{b}_2{z}^{-{\gamma}_2}+\cdots +{b}_{n_b}{z}^{-{\gamma}_{n_b}} \), where αiand γj (i = 1, 2, …, na, j = 1, 2, …, nb) are the fractional orders of the corresponding polynomials, satisfying αi ∈ R+ and γj ∈ R+, and z−1 is the backshift operator, that is, z−1y(k) = y(k − 1).

Model (6) can be written in the following form:

$$ \sum \limits_{i=1}^{n_a}{a}_i{z}^{-{\alpha}_i}y(k)=\sum \limits_{i=1}^{n_b}{b}_i{z}^{-{\gamma}_i}u(k) $$
(7)

When the fractional orders of the denominator and numerator polynomials in (7) are completely different, the fractional-order model is called a nonhomogeneous system; when the fractional order of each polynomial is of the form \( {\alpha}_i=i-\overline{\alpha},{\gamma}_i=j-\overline{\alpha}\left(i=1,2\dots, {n}_a;j=1,2\dots, {n}_b\right) \) (\( \overline{\alpha} \) is the order factor), the model is a same-dimensional system. The fractional-order same-dimensional case is considered in this paper.

The regression equation for Eq. (7) can be written as:

$$ \sum \limits_{i=1}^{n_a}{a}_i{z}^{-i+\overline{\alpha}}y(k)=\sum \limits_{i=1}^{n_b}{b}_i{z}^{-i+\overline{\alpha}}u(k) $$
(8)

By introducing the fractional backward operator zi + αx(t) = Δαx(t − i), Eq. (7) can be written in the following form by means of the discrete fractional operator Δ:

$$ \sum \limits_{i=1}^{n_a}{a}_i{\Delta}^{\overline{\alpha}}y\left(k-i\right)=\sum \limits_{i=1}^{n_b}{b}_i{\Delta}^{\overline{\alpha}}u\left(k-i\right) $$
(9)

We will use Eq. (9) to describe the linear part of the fractional-order Hammerstein-Wiener model.

2.3 Problem description

In a block-structure nonlinear model, the dynamic linear and static nonlinear blocks are connected in series, in parallel or in a feedback structure. Such a model can well describe the nonlinear system of an actual process. The general structure of a Hammerstein-Wiener system is shown in Fig. 1, where a linear dynamic module is surrounded by two static nonlinear modules at its input and output.

Fig. 1
figure 1

Hammerstein-Wiener nonlinear system structure diagram

Specifically, the input-output relationship of the Hammerstein-Wiener system can be expressed as:

$$ y(k)=A(z)g\left(y(k)\right)+B(z)f\left(u(k)\right)+v(k) $$
(10)

In this equation, u(k) and y(k) are the input and output of the system, respectively, and v(k) is the external noise of the system. The nonlinear links can be represented by two static nonlinear functions f(⋅) and g(⋅). The linear elements are represented by the polynomials A(z) and B(z) containing the backshift operator z−1, i.e.,

$$ {\displaystyle \begin{array}{c}A(z)={a}_1{z}^{-{\alpha}_1}+{a}_2{z}^{-{\alpha}_2}+\cdots +{a}_{n_a}{z}^{-{\alpha}_{n_a}}=\sum \limits_{i=1}^{n_a}{a}_i{z}^{-{\alpha}_i}\\ {}B(z)={b}_1{z}^{-{\gamma}_1}+{b}_2{z}^{-{\gamma}_2}+\cdots +{b}_{n_b}{z}^{-{\gamma}_{n_b}}=\sum \limits_{i=1}^{n_b}{b}_i{z}^{-{\gamma}_i}\end{array}} $$
(11)

The two static nonlinear functions f(⋅) and g(⋅) are nonlinear functions composed of several known basis functions, as follows:

$$ {\displaystyle \begin{array}{l}f\left(u(k)\right)={p}_1{f}_1\left(u(k)\right)+{p}_2{f}_2\left(u(k)\right)+\cdots +{p}_{n_p}{f}_p\left(u(k)\right)\\ {}\kern3.25em =\sum \limits_{i=1}^{n_p}{p}_i{f}_i\left(u(k)\right)\end{array}} $$
(12)
$$ {\displaystyle \begin{array}{l}g\left(y(k)\right)={q}_1{g}_1\left(y(k)\right)+{q}_2{g}_2\left(y(k)\right)+\cdots +{q}_{n_p}{g}_q\left(y(k)\right)\\ {}\kern3.25em =\sum \limits_{i=1}^{n_q}{q}_i{g}_i\left(y(k)\right)\end{array}} $$
(13)

where \( {f}_1\left(\cdot \right),\dots, {f}_{n_p}\left(\cdot \right) \) are np known basis functions and \( {g}_1\left(\cdot \right),\dots, {g}_{n_q}\left(\cdot \right) \) are nq known basis functions.

Substituting the above two equations into Eq. (10), we obtain:

$$ y(k)=\sum \limits_{i=1}^{n_a}{a}_i{z}^{-{\alpha}_i}g\left(y(k)\right)+\sum \limits_{i=1}^{n_b}{b}_i{z}^{-{\gamma}_i}f\left(u(k)\right)+v(k) $$
(14)

Substituting Eqs. (12) and (13) into Eq. (14) yields the following expression for the description of the entire system:

$$ y(k)=\sum \limits_{i=1}^{n_a}{a}_i{z}^{-{\alpha}_i}\sum \limits_{j=1}^{n_q}{q}_j{g}_j\left(y(k)\right)+\sum \limits_{i=1}^{n_b}{b}_i{z}^{-{\gamma}_i}\sum \limits_{j=1}^{n_p}{p}_j{f}_j\left(u(k)\right)+v(k) $$
(15)

In this paper, the fractional-order discrete system is considered to be a symmetric system, that is, \( {\alpha}_i=i-\overline{\alpha},{\gamma}_i=j-\overline{\alpha} \), with Δ representing the differential operator. Thus, Eq. (15) can be written as:

$$ {\displaystyle \begin{array}{c}y(k)=\sum \limits_{i=1}^{n_a}{a}_i\sum \limits_{j=1}^{n_q}{q}_j{\varDelta}^{\overline{\alpha}}{g}_j\left(y\left(k-i\right)\right)+\sum \limits_{i=1}^{n_b}{b}_i\sum \limits_{j=1}^{n_p}{p}_j{\varDelta}^{\overline{\alpha}}{f}_j\left(u\left(k-i\right)\right)+v(k)\\ {}\kern2em ={q}_1\sum \limits_{i=1}^{n_a}{a}_i{\varDelta}^{\overline{\alpha}}{g}_1\left(y\left(k-i\right)\right)+\cdots +{q}_{n_q}\sum \limits_{i=1}^{n_a}{a}_i{\varDelta}^{\overline{\alpha}}{g}_{n_q}\left(y\left(k-i\right)\right)\\ {}\kern2.36em +{p}_1\sum \limits_{i=1}^{n_b}{b}_i{\varDelta}^{\overline{\alpha}}{f}_1\left(u\left(k-i\right)\right)+\cdots +{p}_{n_p}\sum \limits_{i=1}^{n_b}{b}_i{\varDelta}^{\overline{\alpha}}{f}_{n_p}\left(u\left(k-i\right)\right)+v(k)\end{array}} $$
(16)

The parameter vectors are defined as:

$$ {\displaystyle \begin{array}{c}a=\left[{}^{a_1}\in {R}^{n_a},b=\right[{}^{b_1}\in {R}^{n_b},\\ {}p=\left[{}^{p_1}\in {R}^{n_p},q=\right[{}^{q_1}\in {R}^{n_q}\end{array}} $$
(17)

To obtain unique model parameters, the parameters of the model are normalized. For this purpose, the first coefficients of the two nonlinear modules are fixed; that is, the first elements in the parameter vectors p and q take values of 1, p1 = 1 and q1 = 1. On this basis, Eq. (16) can be rewritten as:

$$ {\displaystyle \begin{array}{c}y(k)=\sum \limits_{i=1}^{n_a}{a}_i{\varDelta}^{\overline{\alpha}}{g}_1\left(y\left(k-i\right)\right)+{q}_2\sum \limits_{i=1}^{n_a}{a}_i{\varDelta}^{\overline{\alpha}}{g}_2\left(y\left(k-i\right)\right)+\\ {}\cdots +{q}_{n_q}\sum \limits_{i=1}^{n_a}{a}_i{\varDelta}^{\overline{\alpha}}{g}_{n_q}\left(y\left(k-i\right)\right)+\sum \limits_{i=1}^{n_b}{b}_i{\varDelta}^{\overline{\alpha}}{f}_1\left(u\left(k-i\right)\right)+\\ {}{p}_2\sum \limits_{i=1}^{n_b}{b}_i{\varDelta}^{\overline{\alpha}}{f}_2\left(u\left(k-i\right)\right)+\cdots +{p}_{n_p}\sum \limits_{i=1}^{n_b}{b}_i{\varDelta}^{\overline{\alpha}}{f}_{n_p}\left(u\left(k-i\right)\right)+v(k)\end{array}} $$
(18)

According to the definition of each parameter vector in Eq. (17), Eq. (18) can be written in linear regression form as follows:

$$ y(k)={\boldsymbol{\upvarphi}}^{\mathrm{T}}\left(k,\overline{\alpha}\right)\boldsymbol{\uptheta} +v(k) $$
(19)

where \( {\boldsymbol{\upvarphi}}^{\mathrm{T}}\left(k,\overline{\alpha}\right) \) is the information vector, which is

$$ {\boldsymbol{\upvarphi}}^{\mathrm{T}}\left(k,\overline{\alpha}\right)=\left[\begin{array}{c}\boldsymbol{\uppsi} \left(k,\overline{\alpha}\right)\\ {}\boldsymbol{\upzeta} \left(k,\overline{\alpha}\right)\end{array}\right]\in {R}^n,n={n}_q\times {n}_a+{n}_p\times {n}_b $$
(20)

where where \( \boldsymbol{\uppsi} \left(k,\overline{\alpha}\right)={\left[{\boldsymbol{\uppsi}}_1^{\mathrm{T}}\left(k,\overline{\alpha}\right),{\boldsymbol{\uppsi}}_2^{\mathrm{T}}\left(k,\overline{\alpha}\right),\cdots, {\boldsymbol{\uppsi}}_{n_q}^{\mathrm{T}}\left(k,\overline{\alpha}\right)\right]}^{\mathrm{T}}\in {R}^{n_q\times {n}_a} \);

\( {\boldsymbol{\uppsi}}_i\left(k,\overline{\alpha}\right)={\left[{\Delta}^{\overline{\alpha}}{g}_i\left(y\left(k-1\right)\right),\cdots, {\Delta}^{\alpha }{g}_i\left(y\left(k-{n}_a\right)\right)\right]}^{\mathrm{T}},i=1,2,\dots, {n}_q \);

\( \boldsymbol{\upzeta} \left(k,\overline{\alpha}\right)={\left[{\boldsymbol{\upzeta}}_1^{\mathrm{T}}\left(k,\overline{\alpha}\right),{\boldsymbol{\upzeta}}_2^{\mathrm{T}}\left(k,\overline{\alpha}\right),\cdots, {\boldsymbol{\upzeta}}_{n_p}^{\mathrm{T}}\left(k,\overline{\alpha}\right)\right]}^{\mathrm{T}}\in {R}^{n_p\times {n}_b} \); and

\( {\boldsymbol{\upzeta}}_j\left(k,\overline{\alpha}\right)={\left[{\Delta}^{\overline{\alpha}}{f}_j\left(u\left(k-1\right)\right),\cdots, {\Delta}^{\overline{\alpha}}{f}_j\left(u\left(k-{n}_b\right)\right)\right]}^{\mathrm{T}},j=1,2,\dots, {n}_p \).

θis the unknown parameter vector, which is

$$ \boldsymbol{\uptheta} ={\left[\mathbf{a},{q}_2\mathbf{a},\cdots, {q}_{n_q}\mathbf{a},\mathbf{b},{p}_2\mathbf{b},\cdots, {p}_{n_p}\mathbf{b}\right]}^{\mathrm{T}}={\left[\mathbf{q}\otimes \mathbf{a},\mathbf{p}\otimes \mathbf{b}\right]}^{\mathrm{T}}\in {R}^n, $$

where ⊗ is the Kronecker product or direct product, defined as follows: given A = [aij] ∈ Rm × n and B = [bij] ∈ Rp × q, A ⊗ B = [aijB] ∈ R(mp) × (nq).

In the following sections, an identification method is designed in accordance with Eq. (19) to estimate the unknown parameter vectors a, b, p, q and the fractional order \( \overline{\alpha} \) in a fractional-order model.

3 Model parameter identification based on the multi-innovation identification principle

To enable the estimation of the parameters of Model (19), an objective function is first given:

$$ J\left(\boldsymbol{\uptheta} \right)={\left[y(k)-{\boldsymbol{\upvarphi}}^{\mathrm{T}}\left(k,\overline{\alpha}\right)\boldsymbol{\uptheta} \right]}^2 $$
(21)

According to the minimum value of the objective function (21), it is necessary to take the extreme value of the estimated parameter, and the following stochastic gradient descent algorithm can be obtained:

$$ \hat{\boldsymbol{\uptheta}}(k)=\hat{\boldsymbol{\uptheta}}\left(k-1\right)+\mu (k)\boldsymbol{\upvarphi} \left(k,\overline{\alpha}\right)\left[y(k)-{\boldsymbol{\upvarphi}}^{\mathrm{T}}\Big(k,\overline{\alpha}\Big)\hat{\boldsymbol{\uptheta}}\left(k-1\right)\right] $$
(22)

where \( \hat{\boldsymbol{\uptheta}}(k) \) is the estimated parameter vector at the kth instance of sampling. Notably, the fractional order \( \overline{\alpha} \) in (22) is unknown, and the information vector \( \boldsymbol{\upvarphi} \left(k,\overline{\alpha}\right) \) cannot be obtained, meaning that algorithm (22) cannot be used. To overcome this problem, the real fractional order \( \overline{\alpha} \) is replaced with the fractional order estimate \( \hat{\overline{\alpha}} \); then, by taking μ(k) = 1/r(k), with \( r(k)=r\left(k-1\right)+{\left\Vert \hat{\boldsymbol{\upvarphi}}\left(k,\hat{\overline{\alpha}}\right)\right\Vert}^2 \), the gradient descent algorithm can be written as:

$$ \hat{\boldsymbol{\uptheta}}(k)=\hat{\boldsymbol{\uptheta}}\left(k-1\right)+\frac{\hat{\boldsymbol{\upvarphi}}\left(k,\hat{\overline{\alpha}}\right)}{r(k)}\left[y(k)-{\hat{\boldsymbol{\upvarphi}}}^{\mathrm{T}}\Big(k,\hat{\overline{\alpha}}\Big)\hat{\boldsymbol{\uptheta}}\left(k-1\right)\right] $$
(23)
$$ r(k)=r\left(k-1\right)+{\left\Vert \hat{\boldsymbol{\upvarphi}}\left(k,\hat{\overline{\alpha}}\right)\right\Vert}^2,r(0)=1 $$
(24)

In this equation, \( \hat{\boldsymbol{\upvarphi}}\left(k,\hat{\overline{\alpha}}\right) \) is the kth estimated information vector, which is:

$$ \hat{\boldsymbol{\upvarphi}}\left(k,\hat{\overline{\alpha}}\right)=\left[\begin{array}{c}\hat{\boldsymbol{\uppsi}}\left(k,\hat{\overline{\alpha}}\right)\\ {}\hat{\boldsymbol{\upzeta}}\left(k,\hat{\overline{\alpha}}\right)\end{array}\right]\in {R}^n,n={n}_q\times {n}_a+{n}_p\times {n}_b $$
(25)

where\( \hat{\boldsymbol{\uppsi}}\left(k,\hat{\overline{\alpha}}\right)={\left[{\hat{\boldsymbol{\uppsi}}}_1^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right),{\hat{\boldsymbol{\uppsi}}}_2^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right),\cdots, {\hat{\boldsymbol{\uppsi}}}_{n_q}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)\right]}^{\mathrm{T}}\in {R}^{n_q\times {n}_a} \);

\( {\hat{\boldsymbol{\uppsi}}}_i\left(k,\hat{\overline{\alpha}}\right)={\left[{\Delta}^{\hat{\overline{\alpha}}}{g}_i\left(y\left(k-1\right)\right),\cdots, {\Delta}^{\hat{\overline{\alpha}}}{g}_i\left(y\left(k-{n}_a\right)\right)\right]}^{\mathrm{T}},i=1,2,\dots, {n}_q \);

\( \hat{\boldsymbol{\upzeta}}\left(k,\hat{\overline{\alpha}}\right)={\left[{\hat{\boldsymbol{\upzeta}}}_1^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right),{\hat{\boldsymbol{\upzeta}}}_2^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right),\cdots, {\hat{\boldsymbol{\upzeta}}}_{n_p}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)\right]}^{\mathrm{T}} \); and

\( {\hat{\boldsymbol{\upzeta}}}_j\left(k,\hat{\overline{\alpha}}\right)=\left[{\Delta}^{\hat{\overline{\alpha}}}{f}_j\left(u\left(k-1\right)\right),\cdots, {\Delta}^{\hat{\overline{\alpha}}}{f}_j\left(u\left(k-{n}_b\right)\right)\right],j=1,2,\dots, {n}_p \).

\( \hat{\boldsymbol{\uptheta}}(k) \) is the kth estimated parameter vector, \( \hat{\boldsymbol{\uptheta}}={\left[\hat{\mathbf{q}}\otimes \hat{\mathbf{a}},\hat{\mathbf{p}}\otimes \hat{\mathbf{b}}\right]}^{\mathrm{T}}\in {R}^n \).

For the gradient descent algorithm of Eq. (23), its main disadvantage is its slow convergence rate. To improve its performance, the information length is introduced, and the multi-innovation identification principle is adopted to improve the identification performance. The principle of multi-innovation identification is to enhance single-innovation correction technology by proposing a multi-innovation correction technique in order to establish an identification method with multi-innovation correction, which can significantly improve the convergence speed of the identification algorithm. For identification, the single innovation \( e(k)=y(k)-{\hat{\boldsymbol{\upvarphi}}}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)\hat{\boldsymbol{\uptheta}}\left(k-1\right) \) is expanded to a P-dimensional multi-innovation,

$$ {\displaystyle \begin{array}{l}\mathbf{E}\left(P,k,\hat{\overline{\alpha}}\right)=\Big[y(k)-{\hat{\boldsymbol{\upvarphi}}}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)\hat{\boldsymbol{\uptheta}}\left(k-1\right),y\left(k-1\right)-{\hat{\boldsymbol{\upvarphi}}}^{\mathrm{T}}\left(k-1,\hat{\overline{\alpha}}\right)\hat{\boldsymbol{\uptheta}}\left(k-1\right),\dots, \\ {}\kern5em y\left(k-P+1\right)-{\hat{\boldsymbol{\upvarphi}}}^{\mathrm{T}}\left(k-P+1,\hat{\overline{\alpha}}\right)\hat{\boldsymbol{\uptheta}}{\left(k-1\right]}^{\mathrm{T}}\in {R}^L\end{array}} $$

The input-output innovation matrix \( \hat{\boldsymbol{\Phi}}\left(P,k,\hat{\overline{\alpha}}\right) \)and the stacked output vectorY(P, k) are defined as:

$$ \hat{\boldsymbol{\Phi}}\left(P,k,\hat{\overline{\alpha}}\right)=\left[\hat{\boldsymbol{\upvarphi}}\left(k,\hat{\overline{\alpha}}\right),\hat{\boldsymbol{\upvarphi}}\Big(k-1,\hat{\overline{\alpha}}\Big),\dots, \hat{\boldsymbol{\upvarphi}}\Big(k-P+1,\hat{\overline{\alpha}}\Big)\right]\in {R}^{n\times P} $$
(26a)
$$ \mathbf{Y}\left(P,k\right)={\left[y(k),y\left(k-1\right),\dots, y\left(k-P+1\right)\right]}^{\mathrm{T}}\in {R}^P $$
(26b)

The P-dimensional multi-innovation error vector \( \mathbf{E}\left(P,k,\hat{\overline{\alpha}}\right) \) is expressed as:

$$ \mathbf{E}\left(P,k,\hat{\overline{\alpha}}\right)=\mathbf{Y}\left(P,k\right)-{\hat{\boldsymbol{\Phi}}}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)\hat{\boldsymbol{\uptheta}}\left(k-1\right) $$
(27)

When the multi-innovation satisfies P = 1, because \( \hat{\boldsymbol{\Phi}}\left(1,k,\hat{\overline{\alpha}}\right)=\hat{\boldsymbol{\upvarphi}}\left(k,\hat{\overline{\alpha}}\right) \)and \( \mathbf{E}\left(1,k,\hat{\overline{\alpha}}\right)=y(k)-{\hat{\boldsymbol{\upvarphi}}}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)\hat{\boldsymbol{\uptheta}}\left(k-1\right) \), Eq. (23) can be equivalently expressed as:

$$ \hat{\boldsymbol{\uptheta}}(k)=\hat{\boldsymbol{\uptheta}}\left(k-1\right)+\frac{\hat{\boldsymbol{\Phi}}\left(1,k,\hat{\overline{\alpha}}\right)}{r(k)}\mathbf{E}\left(1,k,\hat{\overline{\alpha}}\right) $$
(28)

By replacing the 1-dimensional information vectors \( \hat{\boldsymbol{\Phi}}\left(1,k,\hat{\overline{\alpha}}\right) \) and \( \mathbf{E}\left(1,k,\hat{\overline{\alpha}}\right) \) with a P-dimensional information matrix and multi-innovation vector and taking \( r(k)=r\left(k-1\right)+{\left\Vert \hat{\boldsymbol{\Phi}}\left(P,k,\hat{\overline{\alpha}}\right)\right\Vert}^2 \), the fractional order estimate \( \hat{\overline{\alpha}} \) can be obtained by means of the multi-innovation Levenberg-Marquardt algorithm discussed in Section 4. At this time, the multi-innovation recursive gradient descent algorithm (MIGD) is expressed as follows:

$$ \hat{\boldsymbol{\uptheta}}(k)=\hat{\boldsymbol{\uptheta}}\left(k-1\right)+\frac{\hat{\boldsymbol{\Phi}}\left(P,k,\hat{\overline{\alpha}}\right)}{r(k)}\mathbf{E}\left(P,k,\hat{\overline{\alpha}}\right) $$
(29)
$$ \mathbf{E}\left(P,k,\hat{\overline{\alpha}}\right)=\mathbf{Y}\left(P,k\right)-{\hat{\boldsymbol{\Phi}}}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)\hat{\boldsymbol{\uptheta}}\left(k-1\right) $$
(30)
$$ r(k)=r\left(k-1\right)+{\left\Vert \hat{\boldsymbol{\Phi}}\left(P,k,\hat{\overline{\alpha}}\right)\right\Vert}^2,r(0)=1 $$
(31)
$$ \hat{\boldsymbol{\Phi}}\left(P,k,\hat{\overline{\alpha}}\right)=\left[\hat{\boldsymbol{\upvarphi}}\left(k,\hat{\overline{\alpha}}\right),\hat{\boldsymbol{\upvarphi}}\Big(k-1,\hat{\overline{\alpha}}\Big),\dots, \hat{\boldsymbol{\upvarphi}}\Big(k-P+1,\hat{\overline{\alpha}}\Big)\right]\in {R}^{n\times P} $$
(32)
$$ \mathbf{Y}\left(P,k\right)={\left[y(k),y\left(k-1\right),\dots, y\left(k-P+1\right)\right]}^{\mathrm{T}}\in {R}^P $$
(33)
$$ {\hat{\boldsymbol{\upvarphi}}}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)=\left[\begin{array}{c}\hat{\boldsymbol{\uppsi}}\left(k,\hat{\overline{\alpha}}\right)\\ {}\hat{\boldsymbol{\upzeta}}\left(k,\hat{\overline{\alpha}}\right)\end{array}\right]\in {R}^n,n={n}_q\times {n}_a+{n}_p\times {n}_b $$
(34)
$$ \hat{\boldsymbol{\uppsi}}\left(k,\hat{\overline{\alpha}}\right)={\left[{\hat{\boldsymbol{\uppsi}}}_1^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right),{\hat{\boldsymbol{\uppsi}}}_2^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right),\cdots, {\hat{\boldsymbol{\uppsi}}}_{n_q}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)\right]}^{\mathrm{T}}\in {R}^{n_q\times {n}_a} $$
(35)
$$ {\hat{\boldsymbol{\uppsi}}}_i\left(k,\hat{\overline{\alpha}}\right)={\left[{\Delta}^{\hat{\overline{\alpha}}}{g}_i\left(y\left(k-1\right)\right),\cdots, {\Delta}^{\hat{\overline{\alpha}}}{g}_i\left(y\left(k-{n}_a\right)\right)\right]}^{\mathrm{T}},i=1,2,\dots, {n}_q $$
(36)
$$ \hat{\boldsymbol{\upzeta}}\left(k,\hat{\overline{\alpha}}\right)={\left[{\hat{\boldsymbol{\upzeta}}}_1^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right),{\hat{\boldsymbol{\upzeta}}}_2^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right),\cdots, {\hat{\boldsymbol{\upzeta}}}_{n_p}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)\right]}^{\mathrm{T}} $$
(37)
$$ {\hat{\boldsymbol{\upzeta}}}_j\left(k,\hat{\overline{\alpha}}\right)=\left[{\Delta}^{\hat{\overline{\alpha}}}{f}_j\left(u\left(k-1\right)\right),\cdots, {\Delta}^{\hat{\overline{\alpha}}}{f}_j\left(u\left(k-{n}_b\right)\right)\right],j=1,2,\dots, {n}_p $$
(38)

For the initial values of the parameters to be identified, \( \hat{\boldsymbol{\uptheta}}(0)={\mathbf{1}}_{n_0}/{p}_0 \) is taken, where \( {\mathbf{1}}_{n_0} \) is a column vector consisting entirely of 1 s.

Once the estimated parameter vector \( \hat{\boldsymbol{\uptheta}} \) has been obtained, the first na elements in \( \hat{\boldsymbol{\uptheta}} \) are estimates of the vector \( \hat{\mathbf{a}} \), and the (nanq + 1)th element to (nanq + nb)th elements in \( \hat{\boldsymbol{\uptheta}} \) are estimates of the vector \( \hat{\mathbf{b}} \). In the parameter vector \( \hat{\boldsymbol{\uptheta}} \), there are na estimates of \( {\hat{q}}_j \) and nb estimates of \( {\hat{p}}_j \). The final estimates are calculated by averaging:

$$ {\hat{q}}_j=\frac{1}{n_a}\sum \limits_{i=1}^{n_a}\frac{\theta_{n_a\left(j-1\right)+i}}{{\hat{a}}_i},\kern0.5em j=2,3,\dots, {n}_q $$
(39)
$$ {\hat{p}}_j=\frac{1}{n_b}\sum \limits_{i=1}^{n_b}\frac{\theta_{n_a{n}_q+\left(j-1\right){n}_b+i}}{{\hat{b}}_i},\kern0.5em j=2,3,\dots, {n}_p $$
(40)

4 An estimation method for the fractional order \( \overline{\boldsymbol{\upalpha}} \) based on the multi-innovation principle

For the identification of a fractional-order system, the main parameters to be identified are the fractional order \( \overline{\alpha} \) and the model parameters {ai, bi, pi, qi}. Fractional order estimation and model parameter identification are two stages in the system identification process. The identification results provide the initial conditions for another phase of the algorithm to proceed. For the model parameter identification algorithm, the estimation of the fractional order provides the conditions for identification; for the fractional order estimation algorithm, the identification of the model parameters provides the initial premise for estimation. Figure 2 shows a description of the interactive identification process of the two multi-innovation identification methods.

Fig. 2
figure 2

The interactive identification process of two multi-innovation identification methods

On this basis, the design of the fractional order estimation algorithm is the key step for the success of the whole algorithm. According to the objective function of Eq. (21), the entire identification objective function is:

$$ J=\frac{1}{N}\sum \limits_{k=1}^N{\left[y(k)-\hat{y}(k)\right]}^2=\frac{1}{N}\sum \limits_{k=1}^N{\left[y(k)-{\hat{\boldsymbol{\upvarphi}}}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)\hat{\boldsymbol{\uptheta}}\right]}^2 $$
(41)

where \( \hat{y}(k)={\hat{\boldsymbol{\upvarphi}}}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)\hat{\boldsymbol{\uptheta}} \) and N is the total number of samples.

The purpose is to find a suitable \( \hat{\overline{\alpha}} \) for the parameters \( \hat{\boldsymbol{\uptheta}} \) estimated via the multi-innovation parameter identification algorithm such that J is as small as possible. The Levenberg-Marquardt algorithm iterates as follows:

$$ {\hat{\overline{\alpha}}}^{\left(m+1\right)}={\hat{\overline{\alpha}}}^{(m)}-{\left\{{\left[{J}^{\hbox{'}\hbox{'}}+\lambda I\right]}^{-1}{J}^{\hbox{'}}\right\}}_{\hat{\overline{\alpha}}={\hat{\overline{\alpha}}}^{(m)}} $$
(42)

The update of the fractional order \( \hat{\overline{\alpha}} \) is based on the calculation of the gradient J' and the Hessian matrix J'' corresponding to each \( \overline{\alpha} \), and λ is an adjustment parameter. J' and J'' are calculated as follows:

$$ {\displaystyle \begin{array}{c}{J}_{\hat{\overline{\alpha}}}^{\prime }=-\frac{2}{N}\left[\frac{\partial {\hat{\varphi}}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)\hat{\theta}}{\partial \hat{\overline{\alpha}}}\right]\\ {}=-\frac{2}{N}{\left[\frac{\partial \hat{y}(k)}{\partial \hat{\overline{\alpha}}}\right]}^{\mathrm{T}}\left[y(k)-{\hat{\varphi}}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)\hat{\theta}\right]\\ {}=-\frac{2}{N}{\left[\sigma \hat{y}(k)/\hat{\overline{\alpha}}\right]}^{\mathrm{T}}\left[y(k)-{\hat{\varphi}}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)\hat{\theta}\right]\end{array}} $$
(43)

where \( \sigma \hat{y}(k)/\hat{\overline{\alpha}}=\frac{\partial \hat{y}(k)}{\partial \hat{\overline{\alpha}}} \) is the sensitivity function with respect to \( \hat{\overline{\alpha}} \), and its calculation process is:

$$ \sigma \hat{y}(k)/\hat{\overline{\alpha}}\approx \frac{\hat{y}\left(k,\hat{\overline{\alpha}}+\delta \hat{\overline{\alpha}}\right)-\hat{y}\left(k,\hat{\overline{\alpha}}\right)}{\delta \hat{\overline{\alpha}}} $$
(44)

where \( \delta \hat{\overline{\alpha}} \) is the variation of \( \hat{\overline{\alpha}} \).

The equation for calculating the second derivative \( {J}_{\hat{\overline{\alpha}}}^{\hbox{'}\hbox{'}} \) is:

$$ {\displaystyle \begin{array}{c}{J}_{\hat{\overline{\alpha}}}^{\prime \prime }=\frac{2}{N}{\left(\frac{\partial \hat{y}(k)}{\partial \hat{\overline{\alpha}}}\right)}^{\mathrm{T}}\left(\frac{\partial \hat{y}(k)}{\partial \hat{\overline{\alpha}}}\right)\\ {}\kern1.12em =\frac{2}{N}{\left(\sigma \hat{y}(k)/\hat{\overline{\alpha}}\right)}^{\mathrm{T}}\left(\sigma \hat{y}(k)/\hat{\overline{\alpha}}\right)\end{array}} $$
(45)

Thus, \( {J^{\hbox{'}}}_{\hat{\overline{\alpha}}} \)and \( {J}_{\hat{\overline{\alpha}}}^{\hbox{'}\hbox{'}} \) are calculated as follows:

$$ {J}_{\hat{\overline{\alpha}}}^{\hbox{'}}=-\frac{2}{N}{\left[\sigma \hat{y}(k)/\hat{\overline{\alpha}}\right]}^{\mathrm{T}}\left[y(k)-{\hat{\boldsymbol{\upvarphi}}}^{\mathrm{T}}\Big(k,\hat{\overline{\alpha}}\Big)\hat{\boldsymbol{\uptheta}}\right] $$
(46a)
$$ {J}_{\hat{\overline{\alpha}}}^{\hbox{'}\hbox{'}}=\frac{2}{N}{\left(\sigma \hat{y}(k)/\hat{\overline{\alpha}}\right)}^{\mathrm{T}}\left(\sigma \hat{y}(k)/\hat{\overline{\alpha}}\right) $$
(46b)

For the Levenberg-Marquardt algorithm of Eq. (42), its essence is to use the single-innovation \( y(k)-{\hat{\boldsymbol{\upvarphi}}}^{\mathrm{T}}\left(k,\hat{\overline{\alpha}}\right)\hat{\boldsymbol{\uptheta}} \) estimation algorithm. The following will introduce the multi-innovation Levenberg-Marquardt algorithm, the basic idea of which is to expand the scalar innovation into an innovation vector or matrix using both current and historical data. Many research results show that multi-innovation identification can effectively improve algorithm convergence and the accuracy of parameter estimation.

The stacked sensitivity function vector \( \boldsymbol{\Xi} \left(P,k,\hat{\overline{\alpha}}\right) \) is defined as:

$$ \boldsymbol{\Xi} \left(P,k,\hat{\overline{\alpha}}\right)={\left[\sigma \hat{y}(k)/\hat{\overline{\alpha}},\sigma \hat{y}\left(k-1\right)/\hat{\overline{\alpha}},\dots, \sigma \hat{y}\left(k-P+1\right)/\hat{\overline{\alpha}}\right]}^{\mathrm{T}}\in {R}^P $$
(47)

Based on the definitions of \( \hat{\boldsymbol{\Phi}}\left(P,k,\hat{\overline{\alpha}}\right) \) and Y(P, k), the multi-innovation Levenberg-Marquardt algorithm can be expressed as follows:

$$ {\displaystyle \begin{array}{c}{\hat{\overline{\alpha}}}^{\left(m+1\right)}={\hat{\overline{\alpha}}}^{(m)}+\frac{2}{N}\Big\{{\left[\frac{2}{N}{\varXi}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)\varXi \left(P,k,\hat{\overline{\alpha}}\right)+\lambda I\right]}^{-1}\\ {}{\varXi}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)\left[Y\left(P,k\right)-{\hat{\varPhi}}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)\hat{\theta}\right]\Big\}{}_{\hat{\overline{\alpha}}={\hat{\overline{\alpha}}}^{(m)}}\end{array}} $$
(48)

In this research, the multi-innovation gradient descent method is used to identify the parameters of the fractional-order nonlinear system, and the multi-innovation Levenberg-Marquardt identification algorithm is used to estimate the fractional order of the system. In accordance with interactive estimation theory and the hierarchical identification principle, two algorithms alternately perform parameter identification and fractional order estimation. In each iteration, the parameter estimates rely on previous fractional order estimates, while fractional order estimation is performed based on the parameter estimation of the previous iteration, and the two steps form a complete hierarchical interactive calculation process. The whole algorithm can be summarized as follows:

figure d

To help readers understand the logic and innovation of this paper more clearly, we summarize the proposed algorithm in the form of pseudocode, as shown in Table 1.

Table 1 Pseudocode of the algorithm in this paper

5 Performance analysis

To better illustrate the performance of the algorithm, some mathematical notation is first introduced. The symbols λmax[X] and λmin[X] represent the largest and smallest eigenvalues, respectively, of the matrix X. For g(k) ≥ 0, f(k) = O(g(k)) or f(k) ∼ O(g(k)) represents that there exists a constant δ > 0 such that f(k) ≤ δg(k). In addition, the following lemma is given.

Lemma 1

Suppose that {x(k)}, {ak} and {bk} are sequences of nonnegative random variables and satisfy the following relation:

$$ x(k)\le \left(1-{a}_k\right)x(k)+{b}_k,k\ge 0, $$

whereak ∈ [0, 1) and x(0) < ∞; then, it holds that \( \underset{k\to \infty }{\lim }x(k)\le \underset{k\to \infty }{\lim}\frac{b_k}{a_k} \).

Lemma 2

For system (19) and the single-innovation gradient descent algorithm given by Eqs. (23) and (24), there is a constant \( 0<\overline{\alpha}<\beta <\infty \), and the fractional order estimate \( \hat{\overline{\alpha}} \) causes the input fractional order information vector \( \hat{\boldsymbol{\upvarphi}}\left(k,\hat{\overline{\alpha}}\right) \) of the system to satisfy the following continuous excitation conditions:

  • (A1) \( \hat{\overline{\alpha}}{\mathbf{I}}_n\le \frac{1}{N}\sum \limits_{i=0}^{N-1}\hat{\boldsymbol{\upvarphi}}\left(k+i,\hat{\overline{\alpha}}\right){\hat{\boldsymbol{\upvarphi}}}^{\mathrm{T}}\left(k+i,\hat{\overline{\alpha}}\right)\le \beta {\mathbf{I}}_n, \) a.s. k > 0

Then, r(k) in Eq. (24) satisfies the following inequality:

\( n\overline{\alpha}\left(k-N+1\right)+1\le r(k)\le n\beta \left(k+N-1\right)+1 \), a.s. k > 0.

Prove

By tracing both sides of condition (A1), we can obtain:

\( nN\overline{\alpha}\le \frac{1}{N}\sum \limits_{i=0}^{N-1}{\left\Vert \hat{\boldsymbol{\upvarphi}}\left(k+i,\hat{\overline{\alpha}}\right)\right\Vert}^2\le nN\beta, \) a.s. k > 0.

Let [x] be the largest integer sum nNβ = δ1 no greater than x; it holds that \( {\left\Vert \hat{\boldsymbol{\upvarphi}}\left(k+i,\hat{\overline{\alpha}}\right)\right\Vert}^2\le {\delta}_1,\mathrm{a}.\mathrm{s}. \), and for Eq. (24), continuous iterative calculation is performed:

$$ {\displaystyle \begin{array}{c}r(k)=r\left(k-1\right)+{\left\Vert \hat{\varphi}\Big(k,\hat{\overline{\alpha}}\Big)\right\Vert}^2=\sum \limits_{j=1}^k{\left\Vert \hat{\varphi}\Big(j,\hat{\overline{\alpha}}\Big)\right\Vert}^2+r(0)\\ {}\kern2em \le \sum \limits_{j=0}^{\left[\frac{k-1}{N}\right]}\sum \limits_{i=1}^N{\left\Vert \hat{\varphi}\left( jN+i,\hat{\overline{\alpha}}\right)\right\Vert}^2+r(0)\\ {}\kern2em \le \sum \limits_{j=0}^{\left[\frac{k-1}{N}\right]}{\delta}_1+r(0)\\ {}\kern2em \le \left(\left[\frac{k-1}{N}\right]+1\right){\delta}_1+1\\ {}\kern2em \le n\beta \left(k+N-1\right)+1\end{array}} $$
(49)

In addition:

$$ {\displaystyle \begin{array}{c}r(k)\ge \sum \limits_{j=0}^{\left[\frac{k}{N}\right]-1}\sum \limits_{i=1}^N{\left\Vert \hat{\varphi}\Big( jN+i,\hat{\overline{\alpha}}\Big)\right\Vert}^2+r(0)\\ {}\kern2em \ge \sum \limits_{j=0}^{\left[\frac{k}{N}\right]-1} nN\overline{\alpha}+r(0)\\ {}\kern2em \ge \left(\left[\frac{k}{N}\right]\right) nN\overline{\alpha}+1\\ {}\kern2em \ge n\overline{\alpha}\left(k-N+1\right)+1\end{array}} $$
(50)

The proof of the lemma is complete.

Theorem

For system (19) and the multi-innovation gradient descent algorithm given by Eqs. (29)–(38), the fractional order estimate \( \hat{\overline{\alpha}} \) causes the input fractional order information vector \( \hat{\boldsymbol{\upvarphi}}\left(k,\hat{\overline{\alpha}}\right) \) of the system to satisfy the continuous excitation condition (A1), under the assumption that the noise signal {v(k)} is an independent random signal that satisfies.

  • (A2) E[v(k)] = 0, E[v(k)v(i)] = 0, k ≠ i, \( \mathrm{E}\left[\right(v{(k)}^2\Big]={\sigma}_v^2 \),

where E(⋅) is the mathematical expectation. Then, the parameter estimation error vector \( \tilde{\boldsymbol{\uptheta}}(k):= \hat{\boldsymbol{\uptheta}}(k)-\boldsymbol{\uptheta} \) satisfies:

$$ \underset{k\to \infty }{\lim}\mathrm{E}\left[{\left\Vert \tilde{\boldsymbol{\uptheta}}(k)\right\Vert}^2\right]\le \underset{k\to \infty }{\lim}\frac{N{\beta \sigma}_v^2\left[ n\beta \left(k-N+1\right)+1\right]}{\overline{\alpha}{\left[n\overline{\alpha}\left(k-N+1\right)+1\right]}^2}. $$

Proof

To simplify the calculation, let the innovation length be P = N, and define the noise vector as:

$$ \mathbf{V}\left(P,k\right)={\left[v(k),v\left(k-1\right),\cdots, v\left(k-P+1\right)\right]}^{\mathrm{T}}\in {R}^P, $$

By subtracting θ from both sides of Eq. (29) and using Eqs. (30), (32), (33) and (19), we obtain:

$$ {\displaystyle \begin{array}{c}\overset{\sim }{\theta }(k)=\overset{\sim }{\theta}\left(k-1\right)+\frac{\hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right)}{r(k)}\left[-{\hat{\varPhi}}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)\overset{\sim }{\theta}\left(k-1\right)+V\left(P,k\right)\right]\\ {}\kern1.48em =\left[I-\frac{\hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right){\hat{\varPhi}}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)}{r(k)}\right]\overset{\sim }{\theta}\left(k-1\right)+\frac{\hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right)V\left(P,k\right)}{r(k)}\end{array}} $$
(51)

Taking the norm on both sides of the above equation yields:

$$ {\displaystyle \begin{array}{c}{\left\Vert \overset{\sim }{\theta }(k)\right\Vert}^2\le {\left\Vert \left[I-\frac{\hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right){\hat{\varPhi}}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)}{r(k)}\right]\overset{\sim }{\theta}\left(k-1\right)\right\Vert}^2+\dots \\ {}\kern3.25em 2{\overset{\sim }{\theta}}^{\mathrm{T}}\left(k-1\right)\left[I-\frac{\hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right){\hat{\varPhi}}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)}{r(k)}\right]\frac{\hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right)V\left(P,k\right)}{r(k)}\\ {}\kern3.12em +\frac{{\left\Vert \hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right)V\Big(P,k\Big)\right\Vert}^2}{r^2(k)}\end{array}} $$
$$ {\displaystyle \begin{array}{c}\le {\lambda}_{\mathrm{max}}\left[I-\frac{\hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right){\hat{\varPhi}}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)}{r(k)}\right]{\left\Vert \overset{\sim }{\theta}\left(k-1\right)\right\Vert}^2+\dots \\ {}\kern0.75em 2{\overset{\sim }{\theta}}^{\mathrm{T}}\left(k-1\right)\left[I-\frac{\hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right){\hat{\varPhi}}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)}{r(k)}\right]\frac{\hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right)V\left(P,k\right)}{r(k)}\\ {}\kern1em +\frac{{\left\Vert \hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right)V\Big(P,k\Big)\right\Vert}^2}{r^2(k)}\end{array}} $$
(52)

Using Lemma 1 and condition (A1), we can obtain:

$$ {\displaystyle \begin{array}{c}I-\frac{\hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right){\hat{\varPhi}}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)}{r(k)}\le \left[1-\frac{N\overline{\alpha}}{n\beta \left(k-N+1\right)+1}\right]I,\kern1em \mathrm{a}.\mathrm{s}.\\ {}\kern0.75em \mathrm{E}\left[{\left\Vert \hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right)V\Big(P,k\Big)\right\Vert}^2\right]\\ {}\le \mathrm{E}\left\{{\lambda}_{\mathrm{max}}\left[\hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right){\hat{\varPhi}}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)\right]{\left\Vert V\Big(P,k\Big)\right\Vert}^2\right\}\\ {}\le L\beta \mathrm{E}\left[{\left\Vert V\left(P,k\right)\right\Vert}^2\right]\le {P}^2\beta {\sigma}_v^2={N}^2\beta {\sigma}_v^2\end{array}} $$
(53)

Taking mathematical expectations on both sides of Eq. (52) at the same time, and noting that V(P, k)and \( \tilde{\boldsymbol{\uptheta}}\left(k-1\right) \), \( \hat{\boldsymbol{\Phi}}\left(P,k,\hat{\overline{\alpha}}\right) \), \( \mathbf{I}-\frac{\hat{\boldsymbol{\Phi}}\left(P,k,\hat{\overline{\alpha}}\right){\hat{\boldsymbol{\Phi}}}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)}{r(k)} \) are not linearly correlated, using conditions (A1) and (A2), we have

$$ {\displaystyle \begin{array}{c}\mathrm{E}\left[{\left\Vert \overset{\sim }{\theta }(k)\right\Vert}^2\right]\le \left[1-\frac{N\overline{\alpha}}{n\beta \left(k-N+1\right)+1}\right]\mathrm{E}\left[{\left\Vert \overset{\sim }{\theta}\left(k-1\right)\right\Vert}^2\right]+\dots \\ {}\kern5em 2\mathrm{E}\left\{\overset{\sim }{\theta}\left(k-1\right)\left[I-\frac{\hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right){\hat{\varPhi}}^{\mathrm{T}}\left(P,k,\hat{\overline{\alpha}}\right)}{r(k)}\right]\frac{\hat{\varPhi}\left(P,k,\hat{\overline{\alpha}}\right)V\left(P,k\right)}{r(k)}\right\}\\ {}\kern4.36em +\frac{N^2\beta {\sigma}_v^2}{{\left[n\overline{\alpha}\left(k-N+1\right)+1\right]}^2}\\ {}\le \left[1-\frac{N\overline{\alpha}}{n\beta \left(k-N+1\right)+1}\right]\mathrm{E}\left[{\left\Vert \overset{\sim }{\theta}\left(k-1\right)\right\Vert}^2\right]+\frac{N^2\beta {\sigma}_v^2}{{\left[n\overline{\alpha}\left(k-N+1\right)+1\right]}^2}\end{array}} $$
(54)

Using Lemma 1, we obtain:

$$ \underset{k\to \infty }{\lim}\mathrm{E}\left\{{\left\Vert \tilde{\boldsymbol{\uptheta}}(k)\right\Vert}^2\right\}\le \underset{k\to \infty }{\lim}\frac{N{\beta \sigma}_v^2\left[ n\beta \left(k-N+1\right)+1\right]}{\overline{\alpha}{\left[n\overline{\alpha}\left(k-N+1\right)+1\right]}^2} $$
(55)

The theorem is proven.

The meaning of the above theorem is that since the two multi-innovation algorithms are interactive, if the fractional order estimate can ensure that the input vector of the system is continuously excited, then the parameter identification error of the system can be bounded.

6 Experimental study

6.1 Academic example

To verify the effectiveness of the proposed algorithm, the following fractional-order nonlinear system is considered:

$$ y(k)=A(z)g\left(y(k)\right)+B(z)f\left(u(k)\right)+v(k) $$
(56)

where \( A(z)={a}_1{z}^{-\overline{\alpha}}+{a}_2{z}^{-2\overline{\alpha}} \), \( B(z)={b}_1{z}^{-\overline{\alpha}}+{b}_2{z}^{-2\overline{\alpha}} \), \( f\left(u(k)\right)=\sum \limits_{j=1}^2{p}_j{f}_j\left(u(k)\right) \), and \( g\left(u(k)\right)=\sum \limits_{j=1}^3{q}_j{g}_j\left(y(k)\right) \).

The entire output of the system is:

$$ {\displaystyle \begin{array}{c}y(k)=\sum \limits_{j=1}^3{q}_j\sum \limits_{i=1}^2{a}_i{\varDelta}^{\overline{\alpha}}{g}_j\left(y\left(k-i\right)\right)+\dots \\ {}\kern2.5em +\sum \limits_{j=1}^2{p}_j\sum \limits_{i=1}^2{b}_i{\varDelta}^{\overline{\alpha}}{f}_j\left(u\left(k-i\right)\right)+v(k)\end{array}} $$
(57)

where f1(u(k − i)) = u(k − i), f2(u(k − i)) = u2(k − i), g1(y(k − i)) = y(k − i), g2(y(k − i)) = y2(k − i), and g3(y(k − i)) = y3(k − i).

When the fractional order is taken to be \( \overline{\alpha}=0.6 \), the entire output of the system is:

$$ {\displaystyle \begin{array}{c}y(k)={a}_1{\varDelta}^{0.6}y\left(k-1\right)+{a}_2{\varDelta}^{0.6}y\left(k-2\right)+{q}_2{a}_1{\varDelta}^{0.6}{y}^2\left(k-1\right)+\\ {}{q}_2{a}_2{\varDelta}^{0.6}{y}^2\left(k-2\right)+{q}_3{a}_1{\varDelta}^{0.6}{y}^3\left(k-1\right)+\\ {}{q}_3{a}_2{\varDelta}^{0.6}{y}^3\left(k-2\right)+{b}_1{\varDelta}^{0.6}u\left(k-1\right)+{b}_2{\varDelta}^{0.6}u\left(k-2\right)\\ {}\kern2.24em +{p}_2{b}_1{\varDelta}^{0.6}{u}^2\left(k-1\right)+{p}_2{b}_2{\varDelta}^{0.6}{u}^2\left(k-2\right)+v(k)\end{array}} $$

The parameter vector is:

a = [a1a2]T = [0.1 0.2]T,b = [b1b2]T = [−0.4  − 0.2]T,p = [p1p2]T = [1 0.5]T,q = [q1q2q3]T = [1 0.7 0.35]T.

$$ {\displaystyle \begin{array}{c}\theta ={\left[{a}_1\kern0.5em {a}_2\kern0.5em {q}_2{a}_1\kern0.5em {q}_2{a}_2\kern0.5em {q}_3{a}_1\kern0.5em {q}_3{a}_2\kern0.5em {b}_1\kern0.5em {b}_2\kern0.5em {p}_2{b}_1\kern0.5em {p}_2{b}_2\right]}^{\mathrm{T}}\\ {}={\left[0.1\kern0.5em 0.2\kern0.5em 0.07\kern0.5em 0.14\kern0.5em 0.035\kern0.5em 0.07\kern0.5em -0.4\kern0.5em -0.2\kern0.5em -0.2\kern0.5em -0.1\right]}^{\mathrm{T}}\end{array}} $$

During the simulation, the input signal is a random signal with zero mean and unit variance, and sampling is performed 10,000 times. The noise signal is an independent random signal with zero mean and a variance of σ2 = 0.01. In this study, multi-innovation lengths of P = 1, 3, 5 are selected. To verify the effectiveness of the proposed method, the relative parameter estimation error \( \delta := \left\Vert \hat{\boldsymbol{\uptheta}}(k)-\boldsymbol{\uptheta} \right\Vert /\left\Vert \boldsymbol{\uptheta} \right\Vert \) is used as the evaluation indicator for verification. The parameter identification results are given in Table 2 and Table 2 (continued). Figure 3 shows the parameter estimation error curves with different multi-innovations; Fig. 4 shows the estimation results for the fractional order with different multi-innovations. From the analysis and comparison of these results, it is obvious that as the multi-innovation length increases, the convergence speed of identification becomes faster and the identification accuracy becomes higher. Figure 5 compares the output of the identification model with the actual output when the multi-innovation length is P = 5.

Table 2 Partial parameter identification results of fractional-order nonlinear systems (P = 1, 3, 5)
Fig. 3
figure 3

Parameter estimation error curves for different multi-innovation

Fig. 4
figure 4

Estimation results of fractional order of different multi-innovation

Fig. 5
figure 5

Comparison of the output of the identification model (P = 5) with the actual output

To illustrate the superiority of the proposed method, the method proposed in this paper is compared with the single-innovation Levenberg-Marquardt algorithm and the single-innovation gradient descent method proposed in references [28, 29], etc. Figure 6 shows the results for the method in this paper (P = 5) in terms of the relative error curves with respect to the methods of [28, 29]. From a comparative analysis, it is obvious that the method in this paper offers a better identification convergence speed and higher identification accuracy.

Fig. 6
figure 6

Relative error curve of parameter estimation between the method in this paper (P = 5) and the method in the literature

6.2 Actual system

To further illustrate the applicability of this method in a practical system, we model and identify a flexible manipulator system from the Manufacturing and Automation Laboratory of Katolik University Leuven. In this system, a robotic arm is mounted on a motor; the obtained system input is the reaction torque of the structure, and the output is the acceleration of the flexible arm [30]. To model this system using the method proposed in this paper, the structure of the system needs to be determined first. Reference [28] has carried out Hammerstein-Wiener modeling for this system; therefore, we adopt the system structure determined in the literature [2, 3] to directly apply the method proposed in this paper. Unlike in the academic example of the previous subsection, where the real parameters of the system are known and the modeling effect can be evaluated simply by comparing the accuracy of the parameters, in an actual system, we do not know the real system parameters and so we need to compare the estimated output of the system with the actual output fitting effect.

First, on the premise that the system structure is determined, we use the method proposed in this paper to model the robotic arm system and obtain the objective function values under different innovation numbers. Figure 7 shows the objective functions for innovation numbers of P = 1, 3, 5. It can be seen from Fig. 7 that with an increase in the number of iterations, the objective function gradually converges. By zooming in on the relevant part of the figure, it becomes obvious that the greater the number of innovations is, the higher the estimation accuracy and the smaller the objective function. When the number of innovations is P = 5, the objective function is the smallest.

Fig. 7
figure 7

Objective function J under different multi-innovation

Therefore, an innovation number of P = 5 is selected to model the robotic arm system. Figure 8 shows the output fitting diagram when the innovation number is 5. The actual output and the estimated output are basically in agreement. Figure 9 shows the system estimation error. It can be clearly seen that this error is small and the modeling accuracy is high. To further verify the superiority of the method in this paper, we have added comparisons with other methods from the literature. References [14, 28] proposed different methods for the modeling and identification of this manipulator system and presented corresponding output error figures. We compare the error with the results of these studies as shown in Table 3. It can be clearly seen that the error of the method proposed in this paper is smaller and its modeling accuracy is higher.

Fig. 8
figure 8

System actual output and estimated output (P = 5)

Fig. 9
figure 9

System estimated output error (P = 5)

Table 3 Comparison of the error range of the robot arm estimation with the literature (P = 5)

7 Conclusion

To overcome the difficulties in identifying fractional-order nonlinear systems, a hybrid parameter identification algorithm based on the principle of multiple innovations is proposed. A multi-innovation recursive gradient descent algorithm and a multi-innovation Levenberg-Marquardt algorithm are designed based on the principle of multi-innovation identification to estimate the model parameters and the fractional order of the system. As seen through simulation and verification based on an academic example, proper use of the multi-innovation principle can increase the convergence speed of the proposed identification algorithm and improve the identification accuracy. In addition, the proposed algorithm is applied to an actual flexible manipulator system to verify the applicability of the multi-innovation principle in solving practical problems and confirm the practicability of the algorithm. In this paper, the proposed algorithm is verified in both academic and practical applications, and the model identification results are compared with those of algorithms proposed by other scholars. The method proposed in this paper has higher modeling accuracy and lower error.

From our simulation study, we obtain the following conclusions:

  1. 1)

    With an appropriate increase in the number of innovations, the convergence speed of the system identification algorithm and the identification accuracy can be improved.

  2. 2)

    By introducing the multi-innovation principle, a multi-innovation gradient descent algorithm and a multi-innovation L-M algorithm are designed. The two algorithms take turns estimating the model parameters and fractional order, and the overall algorithm is simple and convenient.

The modeling of MIMO fractional-order nonlinear systems is still a challenging topic, and this is also a future direction of study for researchers.