1 Introduction

Major real processes show nonlinear dynamic behavior, and their modeling and identification remain an active research topic considering the diversity and complexity of nonlinear systems. Their adequate description over the entire range of operation has required many structures such as voltera series, Narmax models, neural networks, black box or block-oriented model, etc., \(\ldots \) [1,2,3,4]. The block-structured class allows the separation of the linear dynamic part and the nonlinear static part into different subsystems which can be interconnected in a different order (Hammerstein, Wiener, Hammerstein–Wiener, etc.).

The more general model of this class is the Hammerstein–Wiener (H-W) model which consists of three subsystems, where a linear block is embedded between two nonlinear subsystems. This more elaborate system topology can improve the model’s performance describing a real nonlinear system with both an actuator nonlinearity and a sensor nonlinearity [5]. Some work has also shown that the H-W system can approximate relatively well almost any high-order nonlinear system [6, 7]. They have been successfully applied to model numerous technological processes such as fermentation bioreactor [8], skeletal muscle system [9], chemical process [10], electrical discharge machining [11], temperature variations in a silage bale [12], etc.

On the other hand, many physical nonlinear processes and materials exhibit a fractional behavior, characterized by a hereditary property and an infinite dimensional structure; hence, the fractional order models have gained an increasing interest among researchers [13,14,15,16]. Their main advantage is the parsimonious models that can mimic the dynamic behavior of real processes with better accuracy than their counterpart classical systems, in addition to having a “memory” included in the model. Among the various applications, we can cite: quantum mechanics [17], chaotic motions [18], diffusive phenomena [19], biochemical reactor [20], etc., \(\ldots \)

In this paper, the identification of fractional H-W model is aimed, where the H-W structure has its linear part of fractional order. The goal is to take advantage of the capability of the H-W system and the parsimony of the fractional order models.

Nonlinear system identification is a major research topic; the most difficult task is to select the model structure and to establish a suitable identification approach. The drawback reported in the literature is that to achieve an accurate identification, we have to deal with the curse of dimensionality.

Many studies considered the identification of the classical integer order class of systems; the simpler structures of Hammerstein or Wiener models have been focused on, while relatively less work has concerned the H-W system identification [5, 21,22,23,24].

In the area of fractional block-oriented system identification, Hammerstein models have been identified using heuristic approaches such as particle swarm optimization (PSO) [25], or genetic algorithm (GA) combined with the recursive least squares (RLS) [26]. An iterative linear optimization algorithm and a Lyapunov method have been developed in [27]. As for the fractional Wiener model, an output error method is used in [28], while a modified PSO is extended in [29].

As a matter of fact, fractional Hammerstein–Wiener models identification is more difficult than that of the simpler Hammerstein and Wiener systems; the complexity lies in the fact that they involve two unknown internal signals not accessible to measurements. Consequently, to the best of the authors knowledge, only one study has considered the identification of a continuous time fractional H-W models based on an instrumental variable method in [30].

In this context, a novel approach for identifying fractional H-W systems in the discrete case is presented.

It is based on an output error approach using a nonlinear optimization method, and the robust Levenberg–Marquardt (L-M) algorithm is developed for the fractional H-W models. Its drawback is the parametric sensitivity functions necessary for the gradient and hessian computation of the update rule. Their complexity depends on the chosen model and requires a heavy computational load at each iteration for the H-W case. To overcome this difficulty, the method reformulates the fractional H-W model under a regression form, which allows a better model parameterization. As a result, the gradient and the Hessian can be obtained in a closed form, avoiding the sensitivity functions computation, and the update burden is drastically reduced. To test the method performance, it is applied to a real arm robot system identification.

The outline of this paper is organized as follows:

Section 2 introduces the required theoretical concepts of fractional calculus. In Sect. 3, the fractional H-W system is presented along with the problem formulation, while the identification method is developed in Sect. 4. Section 5 illustrates the method efficiency with some simulation results and its application to the modeling of an arm robot. Finally, conclusion and some perspectives are provided in Sect. 6.

2 Mathematical background

2.1 Fractional derivative

Fractional calculus has attracted an increasing interest among researchers these last decades with its application in system modeling and control [19, 31, 32]

Different definitions of the differintegral operator have been proposed in the literature, and the most used for the discrete case is the Grünwald–Letnikov one (G-L), expressed as follows [33, 34]:

$$\begin{aligned} \Delta ^{\alpha }x(kh)=\dfrac{1}{h^\alpha }\sum _{j=0}^{k}(-1)^{j} \left( {\begin{array}{c}\alpha \\ j\end{array}}\right) x((k-j)h) \end{aligned}$$
(1)

where \(\Delta ^{\alpha }\) denotes the fractional order difference operator of order \(\alpha \), with zero initial time, x(kh) is a function of \(t=kh\), k is the number of samples, and h is the sampling interval which is assumed to be equal to 1.

\(\left( {\begin{array}{c}\alpha \\ j\end{array}}\right) \) is the binomial term defined by:

$$\begin{aligned} \left( {\begin{array}{c}\alpha \\ j\end{array}}\right) =\left\{ \begin{array}{lll} 1 &{}\quad {\text { for }} \, j=0 \\ \frac{\alpha (\alpha -1) \cdots (\alpha -j+1)}{j!} &{}\quad {\text { for }} \,j>0 \end{array} \right. \end{aligned}$$
(2)

Let us define the following recurrence relation:

$$\begin{aligned} \left\{ \begin{array}{rcl} \beta (0)&{}=&{}1 \\ \beta (j)&{}=&{}\beta (j-1)\frac{(j-1)(\alpha -1)}{j} \text { for} \,j=1, \ldots ,k \end{array} \right. \end{aligned}$$
(3)

where

$$\begin{aligned} \beta (j)=(-1)^{j}\left( {\begin{array}{c}\alpha \\ j\end{array}}\right) \end{aligned}$$
(4)

Eq. (1) can be rewritten under the form:

$$\begin{aligned} \Delta ^{\alpha }x(k)=\sum _{j=0}^{k} \beta (j) x(k-j) \end{aligned}$$
(5)

The numerical simulation of the fractional system studied in this paper is performed using Eq. (5).

2.2 Fractional order models

Different models can be defined in the fractional system description. In this study, the transfer function and the recurrence equation are considered.

The discrete transfer function representation is defined by the following equation:

$$\begin{aligned} G(z)=\dfrac{Y(z)}{U(z)}=\dfrac{\sum _{i=1}^{n_{b}}b_{i}z^{-\gamma _{i}}}{ \sum _{i=1}^{n_{a}}a_{i}z^{-\alpha _{i}}} \end{aligned}$$
(6)

where U(z), Y(z) are, respectively, the system input and the system output, \(\alpha _i\) and \(\gamma _j\in {\mathbb {R}}^{*+}\) are the fractional orders (\(i=1,\ldots ,n_{a}\), and \(j=1,\ldots ,n_{b}\)), and \(z^{-1}\) is the backward shift operator with \(z^{-1}y(k)= y(k-1)\).

The recurrence equation of the model Eq. (6) can be deduced; it is expressed as follows:

$$\begin{aligned} \sum ^{n_{a}}_{i=1}a_{i}z^{-\alpha _{i}}y(k)=\sum _{i=1}^{n_{b}}b_{i}z^{-\gamma _{i}}u(k) \end{aligned}$$
(7)

The fractional models of Eq. (6) and Eq. (7) are called non-commensurate order systems when the fractional orders are completely different; otherwise, when these last are multiples of a same basis \({\tilde{\alpha }}\in {\mathbb {R}}^{*+}\) with (\(\alpha _i=i{\tilde{\alpha }}\) and \(\gamma _j=j{\tilde{\alpha }}\)), the models are of commensurate order. In this paper, the case of fractional commensurate order systems is considered, and the recurrence Eq. (7) can be rewritten under the form:

$$\begin{aligned} \sum ^{n_{a}}_{i=1}a_{i}z^{{-i{\tilde{\alpha }}}}y(k)= \sum _{i=1}^{n_{b}}b_{i}z^{-i{\tilde{\alpha }}}u(k) \end{aligned}$$
(8)

Using the discrete fractional order operator \(\Delta \) in the time domain, Eq. (8) yields the following equation:

$$\begin{aligned} \sum ^{n_{a}}_{i=1}a_{i}\Delta ^{{\tilde{\alpha }}}y(k-i)= \sum _{i=1}^{n_{b}}b_{i}\Delta ^{{\tilde{\alpha }}}u(k-i) \end{aligned}$$
(9)

This model will be used to describe the linear part of the fractional H-W model.

3 Problem description

The general structure of a H-W system is defined by the cascade connection of two nonlinear subsystems with a linear fractional dynamic block embedded between them according to Fig. 1.

Fig. 1
figure 1

The Hammerstein–Wiener system

In this work, the Hammerstein–Wiener model defined in [35] and represented by the block structure of Fig. 2 is adopted. Its input/output equation is expressed as follows:

$$\begin{aligned} y(k)=A(z)g(y(k))+B(z)f(u(k))+v(k) \end{aligned}$$
(10)
Fig. 2
figure 2

The Hammerstein–Wiener model

where u(k) and y(k) are, respectively, the input and the output of the overall system, and v(k) is the noise. The nonlinear parts are described by the functions f and g, while the linear part of fractional order is described by the polynomials A(z) and B(z) of the shift operator \(z^{-1}\) given by the following equations:

$$\begin{aligned} A\left( z\right)= & {} a{_1}z^{-\alpha _1}+a_{{2}}z^{-\alpha _2} \cdots \nonumber \\&+\, a_{n_{a}}z^{-\alpha _{n_{a}}}=\sum \limits ^{n_{a}}_{i=1}a_{i}z^{-\alpha _{i}}\nonumber \\ B\left( z\right)= & {} b{_1}z^{-\gamma _1}+b_{2}z^{-\gamma _2} \cdots \nonumber \\&+\, b_{n_{b}}z^{-\gamma _{n_{b}}}=\sum \limits ^{n_{b}}_{j=1} b_{j}z^{-\gamma _{j}} \end{aligned}$$
(11)

The nonlinear functions f and g are expressed as a linear combination of a known basis, respectively:

\({{\varvec{f}}}=(f_1,\ldots ,f_{n_{p}})\) with the coefficients (\(p_1,\ldots ,p_{n_{p}}\))

\({{\varvec{g}}}=(g_1,\ldots ,g_{n_{q}})\) with the coefficients (\(q_1,\ldots ,q_{n_{q}}\))

where

$$\begin{aligned} f(u(k))= & {} p_{1}f_{1}(u(k))+p_{2}f_{2}(u(k)) \cdots \nonumber \\&+p_{n_{p}}f_{n_{p}}(u(k)) \nonumber \\= & {} \sum \limits _{i=1}^{n_p}p_if_i(u(k)) \end{aligned}$$
(12)
$$\begin{aligned} g(y(k))= & {} q_{1}g_{1}(y(k))+q_{2}g_{2}(y(k)) \cdots \nonumber \\&+q_{n_{q}}g_{n_{q}}(y(k)) \nonumber \\= & {} \sum \limits _{j=1}^{n_q}q_jg_j(y(k)) \end{aligned}$$
(13)

Replacing A(z) and B(z) in Eq. (10) gives:

$$\begin{aligned} y(k)= & {} \sum \limits _{i=1}^{n_{a}}a_{i} z^{-\alpha _{i}} g(y(k))+ \sum \limits _{i=1}^{n_{b}}b_{i} z^{-\gamma _{i}} f(u(k))\nonumber \\&+ v(k) \end{aligned}$$
(14)

Substituting Eq.(12) and Eq.(13) in Eq. (14) results in the Hammerstein–Wiener system overall equation:

$$\begin{aligned} y(k)= & {} \sum \limits _{i=1}^{n_{a}}a_{i} z^{-\alpha _{i}} \sum \limits ^{n_{q}}_{j=1} q_{j}g_{j}(y(k)) \nonumber \\&+ \sum \limits _{i=1}^{n_{b}}b_{i} z^{-\gamma _{i}} \sum \limits ^{n_{p}}_{j=1}p_{j} f_{j}(u(k))+v(k) \end{aligned}$$
(15)

The commensurate order case being considered, (\(\alpha _i=i{\tilde{\alpha }}\)) and (\(\gamma _j=j{\tilde{\alpha }}\)), Eq. (15) can be written in the time domain, using the difference operator \(\Delta \):

$$\begin{aligned} y(k)= & {} \sum \limits _{i=1}^{n_{a}}a_{i} \sum \limits _{j=1}^{n_{q}} q_{j}\Delta ^{{\tilde{\alpha }}}g_{j}(y(k-i)) \nonumber \\&+ \sum \limits _{i=1}^{n_{b}}b_{i} \sum \limits _{j=1}^{n_{p}} p_{j} \Delta ^{{\tilde{\alpha }}}f_{j}(u(k-i))+v(k) \nonumber \\&=q_1 \sum \limits _{i=1}^{n_{a}}a_{i}\Delta ^{{\tilde{\alpha }}}g_{1}(y(k-i)) \cdots \nonumber \\&+ q_{n_{q}}\sum \limits _{i=1}^{n_{a}}a_{i}\Delta ^{{\tilde{\alpha }}}g_{n_{q}}(y(k-i)) \nonumber \\&+p_1\sum \limits _{i=1}^{n_{b}}b_{i}\Delta ^{{\tilde{\alpha }}}f_{1}(u(k-i)) \cdots \nonumber \\&+p_{n_{p}}\sum \limits _{i=1}^{n_{b}}b_{i}\Delta ^{{\tilde{\alpha }}}f_{{n_{p}}}(u(k-i))+v(k) \end{aligned}$$
(16)

The fractional H-W system is defined by the parameter vectors of the linear and the nonlinear subsystems as follows:

$$\begin{aligned} {\varvec{a}}= & {} \left[ a_1\; a_2 \cdots a_{n_{a}}\right] ^\text {T} \in {\mathbb {R}}^{n_a} ,\; \nonumber \\ {\varvec{b}}= & {} \left[ b_1 \; b_2 \cdots b_{n_{b}}\right] ^\text {T}\in {\mathbb {R}}^{n_b} \nonumber \\ {\varvec{p}}= & {} \left[ p_1 \; p_2 \cdots p_{n_{p}} \right] ^\text {T} \in {\mathbb {R}}^{n_p},\; \nonumber \\ {\varvec{q}}= & {} \left[ q_1 \; q_2 \cdots q_{n_{q}} \right] ^\text {T} \in {\mathbb {R}}^{n_q} \end{aligned}$$
(17)

Notice that to obtain a unique parameterization, it is necessary to normalize the model parameters [36]; thus, the first coefficients of two blocks are fixed, here, \(\; {\varvec{p}}\) and \({\varvec{q}}\) are set equal to one, i.e., (\(p_1=1\) and \(q_1=1\)) and Eq. (16) can be rewritten as:

$$\begin{aligned} y(k)= & {} \sum \limits _{i=1}^{n_{a}}a_{i}\Delta ^{{\tilde{\alpha }}}g_{1}(y(k-i)) \nonumber \\&+q_2\sum \limits _{i=1}^{n_{a}}a_{i}\Delta ^{{\tilde{\alpha }}}g_{2}(y(k-i)) \cdots \nonumber \\&+\, q_{n_{q}}\sum \limits _{i=1}^{n_{a}}a_{i}\Delta ^{{\tilde{\alpha }}}g_{n_{q}}(y(k-i))\nonumber \\&+\,\sum \limits _{i=1}^{n_{b}}b_{i}\Delta ^{{\tilde{\alpha }}}f_{1}(u(k-i)) \nonumber \\&+p_2\sum \limits _{i=1}^{n_{b}}b_{i}\Delta ^{{\tilde{\alpha }}}f_{2}(u(k-i)) \cdots \nonumber \\&+\, p_{n_{p}}\sum \limits _{i=1}^{n_{b}}b_{i} \Delta ^{{\tilde{\alpha }}}f_{{n_{p}}}(u(k-i))+v(k) \end{aligned}$$
(18)

The main contribution of this work is to develop a novel identification approach to estimate the unknown parameters of the different subsystems and the fractional order of the Hammerstein–Wiener model.

4 Identification method

The identification objective consists in estimating the parameter vectors \({\varvec{a}},\quad {\varvec{b}},\quad {\varvec{p}},\quad {\varvec{q}}\) and the order \({\tilde{\alpha }}\) in Eq. (18). This approach is based on an output error method using the L-M algorithm. It is a robust nonlinear optimization method which combines the Gauss–Newton method and the steepest descent. However, it suffers from the drawback of the complex computation of the parametric sensitivity functions necessary for the gradient and hessian calculation. In this paper, we extend the L-M algorithm for the identification of the fractional H-W model, and a better model parameterization can be achieved by the reformulation of the system output Eq. (18) under a regression form. This allows for the representation of nonlinear input/output relationship with a linear in the parameters structure.

$$\begin{aligned} y(k)=\varvec{\varphi }^{\text {T}}(k,{ \tilde{\alpha }}){\tilde{\varvec{\theta }}}+v(k) \end{aligned}$$
(19)

where \(\varvec{\varphi }(k,{\tilde{\alpha }})\) denote the information vector defined as follows:

$$\begin{aligned} \varvec{\varphi }(k,\varvec{\alpha })=\left[ \begin{array}{rlc} \varvec{\psi }(k, {\tilde{\alpha }})\\ {} &{}\\ \varvec{\phi }(k,{\tilde{\alpha }}) \end{array}\right] \end{aligned}$$
(20)

where

$$\begin{aligned}&\varvec{\psi }(k,{\tilde{\alpha }})=\left[ \begin{array}{rlc} \psi _1(k,{\tilde{\alpha }}) \cdots \psi _{n_{q}}(k,{\tilde{\alpha }}) \end{array}\right] ^{\text {T}} \end{aligned}$$
(21)
$$\begin{aligned}&\begin{array}{cl} \varvec{\psi }_{i}(k,{\tilde{\alpha }})=\left[ \begin{array}{rlc} \Delta ^{{\tilde{\alpha }}}g_i(y(k-1)) \cdots \Delta ^{{\tilde{\alpha }}}g_i(y(k- n_a)) \end{array}\right] \\ {} &{}\\ \text {for}\quad i=1,\ldots , n_q \end{array} \end{aligned}$$
(22)
$$\begin{aligned}&\varvec{\phi }(k,{\tilde{\alpha }})=\left[ \phi _1(k,{\tilde{\alpha }}) \phi _2(k,{\tilde{\alpha }}) \cdots \phi _{n_{p}}(k,{\tilde{\alpha }})\right] ^{\text {T}} \end{aligned}$$
(23)
$$\begin{aligned}&\begin{array}{cl} \varvec{\phi }_{j}(k,{\tilde{\alpha }})=\left[ \begin{array}{rlc} \Delta ^{{\tilde{\alpha }}}f_j(u(k-1)) \cdots \Delta ^{{\tilde{\alpha }}}f_j(u(k- n_b)) \end{array}\right] \\ {} &{}\\ \text {for}\quad j=1,\ldots , n_p \end{array} \end{aligned}$$
(24)

The unknown parameter vector \({\tilde{\varvec{\theta }}}\) is defined from Eq. (18) and Eq. (19) as follows:

$$\begin{aligned} \begin{array}{rll} {\tilde{\varvec{\theta }}}= \left[ {\varvec{a}}\quad q_2 {\varvec{a}} \cdots q_{n_q} {\varvec{a}} \quad {\varvec{b}} \quad p_2 {\varvec{b}} \cdots p_{n_{p}} {\varvec{b}} \right] ^\text {T} \end{array} \end{aligned}$$
(25)

The Hammerstein–Wiener system identification requires the estimation of the parameter vector \(\varvec{\theta }\) which contains the parameters of the linear block and the nonlinear blocks as well as the fractional order:

$$\begin{aligned} \varvec{\theta }=\left[ {\tilde{\varvec{\theta }}}^{\text {T}} \quad {\tilde{\alpha }} \right] \in {\mathbb {R}}^{n_{\varvec{\theta }}} \quad n_{\varvec{\theta }}=n_an_q+n_bn_p+1 \end{aligned}$$
(26)

The quality of the estimation procedure is measured in terms of the following quadratic criterion:

$$\begin{aligned} J= & {} \dfrac{1}{K} \sum \limits _{k=1}^{K}\varepsilon ^{2}(k) \end{aligned}$$
(27)

where K is the data length, \(\varepsilon (k)\) is the prediction error to be minimized with:

$$\begin{aligned} \varepsilon (k)=y(k)-{\hat{y}}(k)= y(k)-\varvec{\varphi }^\text {T}(k,\hat{{\tilde{\alpha }}}){\hat{\tilde{\varvec{\theta }}}} \end{aligned}$$
(28)

\({\hat{y}}(k)\), \({\hat{\tilde{\varvec{\theta }}}}\) and \(\hat{{\tilde{\alpha }}} \) are, respectively, the estimates of y(k), \({\tilde{\varvec{\theta }}}\) and \({\tilde{\alpha }}\). L-M algorithm uses the following recurrence equation:

$$\begin{aligned} \left\{ \begin{array}{llll} \varvec{\theta }^{(i+1)}&{}=&{}\varvec{\theta }^{(i)}-\left\{ \left[ J^{''}+ \lambda I \right] ^{-1}J^{'}\right\} _{\varvec{\hat{\theta }}=\varvec{\theta }^{(i)}}\\ \end{array}\right. \end{aligned}$$
(29)

The update rule is based on the calculation of the gradient and the Hessian \(J^{\prime }\) and \(J^{\prime \prime }\) with respect to each parameter of \(\varvec{\theta }\), and \(\lambda \) is a tuning parameter for the convergence. The reformulation of the H-W model output equation under a regression form allows a better model parameterization, and the gradient and the Hessian can be obtained in a closed form without using the parametric sensitivity functions. Based on the regression form of the prediction error Eq. (28), they are easily computed by deriving the quadratic functional of Eq. (27) with respect to \({\tilde{\varvec{\theta }}}\):

$$\begin{aligned}&\begin{array}{lll} J^{\prime }_{{\tilde{\varvec{\theta }}}}= & {} - \dfrac{2}{K} \varvec{\varphi }^\text {T}(k,{\tilde{\alpha }})\left[ y(k)-\varvec{\varphi }^\text {T}(k,{\tilde{\alpha }}){\tilde{\varvec{\theta }}}\right] \end{array} \end{aligned}$$
(30)
$$\begin{aligned}&\begin{array}{lll} J^{\prime \prime }_{{\tilde{\varvec{\theta }}}}= & {} \dfrac{2}{K}\varvec{\varphi }^\text {T}(k,{\tilde{\alpha }})\varvec{\varphi }(k,{\tilde{\alpha }}) \end{array} \end{aligned}$$
(31)

The calculation of the gradient and the Hessian with respect to the fractional order \({\tilde{\alpha }}\) (\(J^{\prime }_{{\tilde{\alpha }}}\) and \(J^{\prime \prime }_{{\tilde{\alpha }}}\)) can be performed as follows:

$$\begin{aligned} J^{\prime }_{{\tilde{\alpha }}}= & {} -\dfrac{2}{K}\left[ \dfrac{\partial \left( \varvec{\varphi }^\text {T}(k,\hat{{\tilde{\alpha }}}) {\hat{\tilde{\varvec{\theta }}}}\right) }{\partial {\tilde{\alpha }}}\right] ^\text {T}\left[ y(k)-\varvec{\varphi }^ \text {T}(k,{\tilde{\alpha }}){\tilde{\varvec{\theta }}}\right] \nonumber \\= & {} -\dfrac{2}{K}\left[ \dfrac{\partial {\hat{y}}(k)}{\partial {\tilde{\alpha }}}\right] ^\text {T} \left[ y(k)-\varvec{\varphi }^\text {T}(k, {\tilde{\alpha }}){\tilde{\varvec{\theta }}}\right] \nonumber \\= & {} -\dfrac{2}{K}(\sigma _{{\hat{y}}(k)/{\tilde{\alpha }}})^\text {T} \left[ y(k)-\varvec{\varphi }^\text {T}(k,{\tilde{\alpha }})\varvec{\tilde{x z\varvec{\theta }}}\right] \end{aligned}$$
(32)

where \(\sigma _{{\hat{y}}(k)/{\tilde{\alpha }}}=\dfrac{\partial {\hat{y}}(k)}{\partial {\tilde{\alpha }}}\) is the output sensitivity function with respect to \( {\tilde{\alpha }}\); it is calculated numerically:

$$\begin{aligned} \sigma _{{\hat{y}}/{\tilde{\alpha }}}\approx \dfrac{{\hat{y}}(k, {\tilde{\alpha }} + \delta {\tilde{\alpha }}) - {\hat{y}} (k, {\tilde{\alpha }})}{\delta {\tilde{\alpha }}} \end{aligned}$$
(33)

with \(\delta {\tilde{\alpha }}\) a small variation of \( {\tilde{\alpha }}\).

The Hessian \(J^{\prime \prime }_{{\tilde{\alpha }}}\) can be derived using:

$$\begin{aligned} J^{\prime \prime }_{{\tilde{\alpha }}}= & {} \dfrac{2}{K}\left( \dfrac{\partial {\hat{y}}(k)}{\partial {\tilde{\alpha }}} \right) ^\text {T}\left( \dfrac{\partial {\hat{y}}(k)}{\partial {\tilde{\alpha }}}\right) \nonumber \\= & {} \dfrac{2}{K} (\sigma _{{\hat{y}}(k)/{ \tilde{\alpha }}})^\text {T}(\sigma _{{\hat{y}}(k)/{\tilde{\alpha }}}) \end{aligned}$$
(34)

Hence, the gradient \(J_{\varvec{\theta }}^{\prime }\) and the Hessian \(J_{\theta }^{\prime \prime }\) are expressed by these equations:

$$\begin{aligned}&\left\{ \begin{array}{cr} {\varvec{J}}^{'}_{\theta }= \left[ \begin{array}{cccc} J^{'}_{{\tilde{\varvec{\theta }}}}\\ {} &{}\\ J^{'}_{{\tilde{\alpha }}}\end{array}\right] =- \dfrac{2}{K} \left[ \begin{array}{cc} \varvec{\varphi }^\text {T}(k,{\tilde{\alpha }})(y(k)-\varvec{\varphi }^\text {T}(k,{\tilde{\alpha }}){\tilde{\varvec{\theta }}})\\ {} &{}\\ (\sigma _{{\hat{y}}(k)/{\tilde{\alpha }}})^\text {T}(y(k)-\varvec{\varphi }^\text {T}(k,{\tilde{\alpha }}){\tilde{\varvec{\theta }}}) \end{array}\right] \\ {} &{}\\ \end{array}\right. \nonumber \\ \end{aligned}$$
(35)
$$\begin{aligned}&\left\{ \begin{array}{cr} {\varvec{J}}^{\prime \prime }_{\theta }= \left[ \begin{array}{cccc} J^{\prime \prime }_{{\tilde{\varvec{\theta }}}}\\ {} &{}\\ J^{\prime \prime }_{{\tilde{\alpha }}}\end{array}\right] = \dfrac{2}{K} \left[ \begin{array}{cc} \varvec{\varphi }^\text {T}(k,{\tilde{\alpha }})\varvec{\varphi }(k,{\tilde{\alpha }})\\ {} &{}\\ \left( \sigma _{{\hat{y}}(k)/{\tilde{\alpha }}}\right) ^\text {T}\left( \sigma _{{\hat{y}}(k)/{\tilde{\alpha }}}\right) \end{array}\right] \\ {} &{}\\ \end{array}\right. \end{aligned}$$
(36)

The main steps of the developed approach can be summarized as follows:

  1. 1.

    Collect the input–output data set [u(k), y(k)].

  2. 2.

    Let \(i=1\), and set the initial values \({{\tilde{\varvec{\theta }}}}^{0}\), \({\tilde{\alpha }}^{\small {0}}\) and \(\delta {\tilde{\alpha }}\).

  3. 3.

    Form \(\varvec{\varphi }(k,{\tilde{\alpha }})\) using Eq. (20 ).

  4. 4.

    Compute the output fractional order sensitivity function \(\sigma _{{\hat{y}}(k)/ {\tilde{\alpha }}}\) using Eq. (33).

  5. 5.

    Compute \(J^{'}\) using Eq. (35) and \(J^{''}\) using Eq. (36).

  6. 6.

    Update the parameter estimate \(\varvec{\theta }^{(i)}\) using Eq. (29).

  7. 7.

    Compute the quadratic function using Eq. (27).

  8. 8.

    If \(J(\varvec{\theta }^{(i+1)})<J(\varvec{\theta }^{(i)})\), increase \(\lambda \), otherwise, decrease \(\lambda \) and set \(\hat{\varvec{\theta }}=\varvec{\theta }^{(i)}\) and \(J(\hat{\varvec{\theta }})=J({\varvec{\theta }^{i}})\) and go to step 3.

The H-W obtained parameter estimates \( \hat{{\varvec{a}}}\),   \( \hat{{\varvec{b}}}\),   \( \hat{{\varvec{p}}}\),   \( \hat{{\varvec{q}}}\) and \( \hat{{\tilde{\alpha }}}\) can be read from the vector \(\hat{\varvec{\theta }}\) as follows:

The estimates of the vector \({\varvec{a}}\) elements are the first \(n_{{\varvec{a}}}\) values of \(\hat{\varvec{\theta }}\), \( \hat{{\varvec{b}}}\) can be read from the (\(n_{{\varvec{a}}}n_{{\varvec{q}}}+1\)) to (\(n_{{\varvec{a}}}n_{{\varvec{q}}}+n_{{\varvec{b}}}n_{{\varvec{p}}}\)) elements of \(\hat{\varvec{\theta }}\) and \({{\tilde{\alpha }}}\) is the final element. From Eq. (25), we notice that for each \(q_j\), we have \(n_{{\varvec{a}}}\) estimates \(\hat{q_j}\); hence, the mean value may be computed as its estimate:

$$\begin{aligned} {\hat{q}}_{j}=\frac{1}{n_{{\varvec{a}}}}\sum \limits _{i=1}^{n_{{\varvec{a}}}} \frac{\varvec{\hat{\theta }}_{(j-1)n_{{\varvec{a}}}+i}}{{\hat{a}}_{i}} \; j=2, \ldots ,n_{{\varvec{q}}} \end{aligned}$$
(37)

Similarly, the estimate of \({\varvec{p}}\) is deduced

$$\begin{aligned} {\hat{p}}_{j}=\frac{1}{n_{{\varvec{b}}}} \sum \limits _{i=1}^{n_{{\varvec{b}}}}\frac{\varvec{\hat{\theta }}_{ n_{{\varvec{a}}}n_{{\varvec{q}}}+(j-1)n_{{\varvec{b}}}+i}}{ \varvec{{\hat{b}}}_{i}} \; j=2, \ldots ,n_{{\varvec{p}}} \end{aligned}$$
(38)

The effectiveness of the developed method is tested in the next section using numerical simulations.

5 Simulation examples

Two examples are presented in this section: the first one is an academic example that illustrates the statistical performance of the proposed algorithm for different signal-to-noise ratios (SNR); the second example is a real experiment of a flexible robot arm which is intended to be modeled with a fractional H-W structure.

Without loss of generality, the nonlinear functions f and g are assumed to be polynomials of orders \(n_{{\varvec{p}}}\) and \(n_{{\varvec{q}}}\), respectively.

The first step required to perform good system identification is the choice of the model structure; in the present work, it consists in determining the values of \(n_{{\varvec{a}}}\), \(n_{{\varvec{b}}}\), \(n_{{\varvec{p}}}\) and \(n_{{\varvec{q}}}\) of the linear and nonlinear blocks. In this aim, different values are tried out, along with the estimation procedure, and the best structure with the smallest criterion is selected.

5.1 Example 1: academic example

Let us consider the fractional Hammerstein–Wiener system of commensurate order \({\tilde{\alpha }}=0.6\), with \(n_{{\varvec{a}}}= 2\), \(n_{{\varvec{b}}}=2\). The nonlinear parts \({\varvec{f}}\) and \({\varvec{g}}\) adopt the polynomials form of orders \(n_{{\varvec{p}}}=2\), and \( n_{{\varvec{q}}}=3\), respectively.

$$\begin{aligned} A\left( z\right)= & {} a{_1}z^{-{\tilde{\alpha }}}+a_{{2}}z^{-2{\tilde{\alpha }}}\nonumber \\ B\left( z\right)= & {} b{_1}z^{-{\tilde{\alpha }}}+b_{2}z^{-2{\tilde{\alpha }}}\nonumber \\ f\left( u(k)\right)= & {} \sum \limits _{j=1}^{2}p_jf_j(u(k))\nonumber \\ g\left( y(k)\right)= & {} \sum \limits _{j=1}^{3}q_jg_j(y(k)) \end{aligned}$$
(39)

The overall output system equation is as follows:

$$\begin{aligned} y(k)= & {} \sum \limits _{j=1}^{3}q_j\sum \limits _{i=1}^{2}a_i\Delta ^{0.6}g_j(y(k-i))\nonumber \\&+\sum \limits _{j=1}^{2}p_j\sum \limits _{i=1}^{2}b_i\Delta ^{0.6}f_j(u(k-i))\nonumber \\ \quad&+v(k) \end{aligned}$$
(40)

with

$$\begin{aligned}&f_1(u(k-i))=u(k-i)\nonumber \\&f_2(u(k-i))=u^2(k-i)\nonumber \\&g_1(y(k-i))=y(k-i)\nonumber \\&g_2(y(k-i))=y^2(k-i)\nonumber \\&g_3(y(k-i))=y^3(k-i) \end{aligned}$$
(41)
$$\begin{aligned}&y(k)=a_1\Delta ^{0.6}y(k-1)+a_2\Delta ^{0.6}y(k-2)\nonumber \\&\qquad \qquad +q_2a_1\Delta ^{0.6}y^2(k-1)+q_2a_2\Delta ^{0.6}y^2(k-2)\nonumber \\&\qquad \qquad +q_3a_1\Delta ^{0.6}y^3(k-1)+q_3a_2\Delta ^{0.6}y^3(k-2)\nonumber \\&\qquad \qquad +b_1\Delta ^{0.6}u(k-1)+b_2\Delta ^{0.6}u(k-2) \nonumber \\&\qquad \qquad + p_2b_1\Delta ^{0.6}u^2(k-1)+p_2b_2\Delta ^{0.6}u^2(k-2) \nonumber \\&\qquad \qquad + v(k) \end{aligned}$$
(42)

where the parameter vectors to be estimated are:

$$\begin{aligned} {\varvec{a}} \;= & {} \; [\; a_1\quad a_2\; ]^\text {T} \; = \; [\;0.1\quad 0.2\; ]^\text {T} \nonumber \\ {\varvec{b}} \;= & {} \; [\; b_1\quad b_2\;]^\text {T} \; = \;[ \;-0.4\quad -0.2 \; ]^\text {T}\nonumber \\ {\varvec{p}} \;= & {} \; [ \;p_1\quad p_2 \;]^\text {T} \;= \;[ \;1 \quad 0.5 \;]^\text {T}\nonumber \\ {\varvec{q}} \;= & {} \;[ \;q_1\quad q_2\quad q_3 \;]^\text {T}= \;[ \;1\quad 0.7\quad 0.35 \;]^\text {T} \end{aligned}$$
(43)

The input u(k) is taken as a persistent excitation sequence with zero mean and unit variance, and v(k) is a white noise sequence with zero mean and constant variance. The data set is of length \(K=1000\).

In the first phase, the right structure has to be determined and various combinations \(n_a\), \(n_b\), \(n_p\), \(n_q\) are tested. The criteria evolution for the best structures versus the iteration number is represented in Fig. 3, and the obtained values for J, for each structure are tabulated in Table 1; the exact structure is recovered with a cost function \(J\simeq 0\).

In the second phase, and through the use of the best structure, the algorithm is evaluated in the absence of noise and with noisy data, for different signal-to-noise ratios (SNR): 34 dB and 26 dB.

The characteristics of the fractional H-W system for the noise-free case are shown graphically in Fig. 4. In the first figure, the estimated output is compared to the simulated one, while the second figure represents the prediction error. The results obtained are satisfactory: they show that the error is null and the output overlaps with the data.

For noisy measurements, with \(SNR=34\) dB and \(SNR=26\) dB, the comparison of the real output and the estimated one along with their respective prediction errors is depicted in Figs. 5 and  6 for each SNR.

From the obtained results, we can conclude that the outputs correspond to the real data with a perfect adequacy and the errors are relatively small.

To test the robustness of the algorithm, a Monte Carlo simulation is performed for 50 sets of noise realization, with \(SNR=34\) dB and \(SNR=26\) dB.

The mean values and the variance of the estimated parameters, and the obtained criteria J are listed in Table 2.

It can be noticed that the parameters of the linear part, nonlinear part and the fractional order converge to their exact values. The resulting criterion J is equal to \(3.2e-4\) for \(SNR=34\) dB, while for \(SNR=25\) dB, it is equal to \(2.4e-3\). These results clearly verify the effectiveness of the proposed method and confirm its statistical performance.

Table 1 Structure test results of Example 1
Fig. 3
figure 3

Evolution of the criteria versus the number of iterations

Fig. 4
figure 4

Identification results for the noise-free case

Fig. 5
figure 5

Identification results for Example 1 for \(SNR=34dB\)

Fig. 6
figure 6

Identification results for Example 1 for \(SNR=25 dB\)

Table 2 Monte Carlo Simulation Results

5.2 Application to a robot arm benchmark

To illustrate further the method, a benchmark data set taken from the identification database DAISY (Database for the Identification of Systems) [37] is used.

The identification of an experimental flexible robot arm shown in Fig. 7 is intended. The system consists of an arm installed on an electrical motor, whose input is the reaction torque of the structure on the ground, and the output is the acceleration of the flexible arm. The measured data set contains 1024 samples, which is divided into two parts: the first part is selected for the identification task, while the second part is used for the validation procedure.

Fig. 7
figure 7

Flexible robot arm

The input/output signals of the experimental robot arm are represented in Fig. 8.

This nonlinear benchmark has been modeled in the literature using neural networks, and classical NARX and NLARX structures [38, 39], and in this study, the fractional Hammerstein–Wiener model is tested.

Fig. 8
figure 8

System input and output

In the first step, the best structure of the model is investigated, different choices of the orders \(n_{{\varvec{a}}}\), \(n_{{\varvec{b}}}\), \(n_{{\varvec{p}}}\) and \(n_{{\varvec{q}}}\) are tested, and the best structure is determined from the lowest value of the quadratic criterion J. The analysis of the criteria evolution of different structures is illustrated in Fig. 9, and Table 3 shows the obtained values of the criterion J.

We can conclude that the best model structure is obtained for the orders \(n_{{\varvec{a}}}=3\), \(n_{{\varvec{b}}}=5\), \(n_{{\varvec{p}}}=2\), and \(n_{{\varvec{q}}}=1\) with the mean square error \(J=9e-3\).

Using this structure, the robot arm’s measured output is compared with the estimated one in Fig. 10 along with the prediction error.

The estimated output corresponds to the real data with a good adequacy, and the error is small. The validation results are depicted in Fig. 11.

Table 3 Structure test of the arm robot system
Fig. 9
figure 9

Criteria J versus number of iterations of the benchmark example

Fig. 10
figure 10

Arm robot identification results

Fig. 11
figure 11

Validation results

The estimated model of the robot arm’s benchmark is a fractional H-W model of order \({\tilde{\alpha }}=0.701\), with the parameters given by the vectors a,\( \; b\), \( \; p\), \( \; q\) as follows:

$$\begin{aligned} {\varvec{a}}= & {} [ 0.407 \quad -0.278 \quad -0.024]^\text {T}, \\ {\varvec{b}}= & {} [ -0.378 \quad 0.039 \quad 0.027 \quad -0.013 \quad -10.724]^\text {T}, \\ {\varvec{p}}= & {} [1\quad -10.724]^T,\quad \quad {\varvec{q}}=[1]. \end{aligned}$$

The fractional H-W model simulation results show a good agreement with the experimental robot arm’s data, and the parameters are estimated with relatively less errors than the ones reported in the literature. Moreover, a model complexity reduction is achieved since the number of the model parameters is 11 versus 16 or more in the literature and the identification procedure is achieved for 40 iterations. This confirms the efficiency of the proposed identification method.

6 Conclusion

In this paper, a novel approach for the fractional order H-W system identification is developed. An output error framework is adopted based on the robust Levenberg–Marquardt algorithm. The difficulty of the parametric sensitivity functions implementation is circumvent by reformulating the H-W model output under a regression form. The main advantage is that the gradient and the Hessian equations can be derived easily and the identification burden is drastically reduced.

The method’s efficiency has been confirmed in an academic example where consistent estimates of the subsystems parameters and the fractional order are obtained. The estimator statistical performance in presence of noise is verified using Monte Carlo simulations.

The quality of a nonlinear model requires the balance between the accuracy, the number of parameters and the computational load of the identification. The application of the fractional H-W model to the study of a flexible robot arm validates the performance of the developed system identification methodology, where a satisfactory model fit is achieved with a reduced number of parameters.

Further work will consider the extension of the identification approach to other fractional cascaded block-oriented models.