1 Introduction

Fractional order model identification faces an additional challenge as it requires to estimate the model orders on a large space. Moreover, if there is a time delay, the parameter estimation problem becomes further complicated due to the nonlinear appearance of the delay term in the model equation. To overcome these issues, several approaches have been adopted in the literature. Victor et al. [40] used a two-stage algorithm to estimate the orders using an optimization approach and estimated the model coefficients by solving a least-squares equation. Narang et al. [25] used a similar approach for time delay models. There are only a few methods for simultaneous estimation of all parameters of fractional order models [36]. However, these methods are not applicable for the step input, although the characteristics of step responses of fractional order models have been studied in the literature [22, 35, 37]. To the best of the knowledge of the author, there is no reported method that simultaneously estimates orders, coefficients and the delay from the step response. Ahmed [1] presented a method for estimation of coefficients and delay, however, for known fractional orders.

In this article, an output error approach for simultaneous estimation of model orders, coefficients and time delay is proposed by adopting the Newton and the Gauss–Newton optimization algorithms. The optimization approach requires estimation of the output sensitivity functions. For fractional order models, these functions include the logarithmic derivatives of the input signal. Victor and Malti [39] and Victor et al. [40] commented that simulating the logarithmic derivative is not trivial and used a numerical approach to calculate the sensitivity functions. The contribution of this article lies in its presentation of the analytical expressions for the logarithmic derivatives of the step input signal and derivations of the analytical expressions for the Jacobian and the Hessian required for the Newton’s algorithm. The efficacy of the algorithm lies in its ability to identify all the parameters simultaneously.

The output error approach is the most commonly used methodology for fractional order model identification [10, 17, 31, 39, 40]. Other approaches include an extension [18] of the so-called simplified refined instrumental variable algorithm [41], use of the state variable filter [8], fractional Laguerre basis function modeling [3], frequency domain identification [26], ARX model development using fractional order and orthonormal basis filter [23], integral-based approach [34] and so on. Fractional order nonlinear system identification problem has also been addressed in the literature [19, 33]. The time delay adds an extra degree of complexity to any identification method due to its nonlinear appearance. Narang et al. [25] extended the linear filter method [2] for fractional order model identification that iteratively estimates the model parameters and the delay. Tavakoli-Kakhki and Tavazoei [36] and Yuan et al. [42] also considered time delay estimation.

Motivations for addressing the fractional order identification problem arises from the advantages of the use of the fractional order models [4, 20] and the use of such models for real-life applications. Reported applications include thermal processes [12], processes involving diffusive mass transfer [5], Archimedes wave swing [38], vibration suppression [24], health monitoring [15] and other engineering applications, see, e.g., [22, 40]. Another motivation is the use of fractional order controllers. With introduction of the TID controller [16], the \(PI^{\lambda }D^{\mu }\) controller [30], the CRONE controller [29] and the lead-lag compensator [32], fractional order representation of process and controllers are gaining more and more attention.

The remainder of the article is organized as follows. The proposed methodology is outlined in Sect. 2 that includes the optimization algorithm, expressions for the logarithmic derivatives and the initialization procedure. Simulation results along with the simulation method for fractional order differentiation and integration are presented in Sect. 3. Concluding remarks are drawn in Sect. 4.

2 Mathematical Formulations

The proposed identification method is outlined in this section. The method is based on the output error optimization. Analytical expressions for logarithmic derivatives of the step input required to evaluate the Jacobian and the Hessian are also presented.

For convenience in presentation, both time and Laplace domain expressions are used interchangeably. The Laplace domain expressions are presented in terms of ‘s’ while an equivalent symbol ‘p’ is used to represent the derivative operator in the time domain. Considering that step response methods are typically used for models with parsimonious structures, mathematical derivations are provided for two structures as in (1) and (2). We will refer to these two structures as Class I and Class II models, respectively, throughout the manuscript. Although the method is demonstrated using these two structures, the same approach can be followed for a general structure with more parameters.

$$\begin{aligned} \mathrm {Class}\;\mathrm {I}: \;\;G(s)=\frac{be^{-\delta s}}{s^{\alpha }+a} \end{aligned}$$
(1)
$$\begin{aligned} \mathrm {Class}\;\mathrm {II}: \;\; G(s)=\frac{b e^{-\delta s}}{s^{\alpha _2} +a_1 s^{\alpha _1}+a_0} \end{aligned}$$
(2)

2.1 Parameter Estimation

For a single input single output system, the relation between the input and the output can be expressed using the following Laplace domain expression.

$$\begin{aligned} Y(s) = G(s,\varvec{\theta }) U(s) + E(s) \end{aligned}$$
(3)

where Y(s) and U(s) are the input and the output, respectively, \(G(s,\varvec{\theta })\) is the model transfer function with \(\varvec{\theta } =[\theta _{1}, \ldots , \theta _{n}]\) as the set of parameters with n being the total number of parameters. E(s) represents the noise in the output measurements. For a fractional order model, the parameter vector contains the coefficients of the numerator and the denominator polynomials, the time delay as well as the fractional orders of the derivatives. The objective of an identification algorithm is to estimate the parameters \(\varvec{\theta }\) from a set of time domain measurements \([u(t_k)\;y(t_k)], k=1,2,\ldots N\) with N being the number of available data points. The goal of the output error approach is to estimate \(\varvec{\theta }\), by minimizing a norm of the errors between measured and model outputs.

$$\begin{aligned} E(s, \varvec{\theta }) = Y(s)- G(s,\varvec{\theta }) U(s) \end{aligned}$$
(4)

An equivalent time domain expression for the error is given by

$$\begin{aligned} e(t,\varvec{\theta }) = y(t) - G(p,\varvec{\theta }) u(t) \end{aligned}$$
(5)

The lower case letters correspond to the variables in the time domain. Using the notation \(e_k = e(t_k,\varvec{\theta })\), the goal of an output error algorithm can be defined as to minimize the following objective function

$$\begin{aligned} f(\varvec{\theta })=\sum ^N_{k=1}e^2_k = \parallel \mathbf{e} \parallel ^2 \end{aligned}$$
(6)

A number of different approaches can be taken for solution of the optimization problem. We follow the Newton’s algorithm to simultaneously estimates all the parameters. In this approach, estimated parameters at an iteration step \(i+1\) is given by

$$\begin{aligned} \varvec{\theta }^{i+1} = \varvec{\theta }^{i} -\left[ \varvec{H}(\varvec{\theta }^{i})\right] ^{-1} g(\varvec{\theta }^{i}) \end{aligned}$$
(7)

where \(g(\varvec{\theta })\) is the gradient of \(f(\varvec{\theta })\) given by

$$\begin{aligned} g(\varvec{\theta }) = 2\varvec{A} \varvec{e} \end{aligned}$$
(8)

with \(\varvec{A}\) being the Jacobian matrix.

$$\begin{aligned} \varvec{A} = \left[ \begin{array}{cccc} \nabla e_1&\nabla e_2&\cdots&\nabla e_N \end{array}\right] \end{aligned}$$
(9)

The columns of \(\varvec{A}\) are the first derivative vectors \(\nabla e_k\) of the components of \(\varvec{e}\), i.e.,

$$\begin{aligned} A_{jk}=\frac{\partial e_k}{\partial \theta _j} \quad j=1,\ldots ,n \quad k=1,\ldots ,N \end{aligned}$$
(10)

where \(\theta _j\) is the j-th element of \(\varvec{\theta }\). The Hessian, \(\varvec{H}\) is defined as

$$\begin{aligned} \varvec{H} = \nabla ^2 f(\varvec{\theta }) \end{aligned}$$
(11)

For the Hessian, it requires

$$\begin{aligned} \frac{\partial ^2}{\partial \theta _l \partial \theta _j} \sum ^N_{k=1}e^2_k= & {} 2 \frac{\partial }{\partial \theta _l} \sum ^N_{k=1}e_k \frac{\partial e_k}{\partial \theta _j} \nonumber \\= & {} 2 \sum ^N_{k=1} \frac{\partial e_k}{\partial \theta _l} \frac{\partial e_k}{\partial \theta _j} + 2 \sum ^N_{k=1} e_k \frac{\partial ^2 e_k}{\partial \theta _l \partial \theta _j} \end{aligned}$$
(12)

So the Hessian is given by

$$\begin{aligned} \varvec{H} = 2 \varvec{A} \varvec{A}^T + 2 \sum ^N_{k=1} e_k \varvec{R_k} \end{aligned}$$
(13)

where \(\varvec{A^T}\) is the transpose of \(\varvec{A}\), and

$$\begin{aligned} \varvec{R_k} = \left( \begin{array}{cccc} \frac{\partial ^2 e_k}{\partial \theta _1^2} &{} \frac{\partial ^2 e_k}{\partial \theta _1 \partial \theta _2} &{}\cdots &{} \frac{\partial ^2 e_k}{\partial \theta _1 \partial \theta _n} \\ \vdots &{} \ddots &{} \vdots &{} \vdots \\ \frac{\partial ^2 e_k}{\partial \theta _n \partial \theta _1} &{} \frac{\partial ^2 e_k}{\partial \theta _n \partial \theta _2} &{}\cdots &{}\frac{\partial ^2 e_k}{\partial \theta _n^2}\\ \end{array}\right) \end{aligned}$$
(14)

The main disadvantage of the above approach, namely the Newton’s method is that it requires formulae from which the second derivative matrix can be evaluated. However, there are methods closely related to this approach which use only the first derivatives. One such approach is the finite difference Newton method. In this approach, to estimate \(\varvec{H}(\varvec{\theta }^i)\), increments in each coordinate direction of the output error are taken by differences in gradient vectors [11]. Another approach is the so-called quasi-Newton methods which approximate \([\varvec{H}(\varvec{\theta })]^{-1}\) by a symmetric positive definite matrix and update it as the iteration proceeds.

The second term in (13) contains \(e_k\) as multipliers which are reasonably small especially as the iteration approaches the optimum. This led to the assumption that \(\varvec{H}\) can be approximated as

$$\begin{aligned} \varvec{H} \approx 2 \varvec{A} \varvec{A}^T \end{aligned}$$
(15)

Thus, the basic Newton’s method becomes the Gauss–Newton method when (15) is used. Accordingly, the solution of \(\varvec{\theta }\) in the \((i+1)\)-th iteration step is given by

$$\begin{aligned} \varvec{\theta }^{i+1} = \varvec{\theta }^{i} - \left[ \varvec{A}\varvec{A}^T\right] ^{-1} \varvec{A} \varvec{e} \end{aligned}$$
(16)

The basic Newton’s method as well as the Gauss–Newton method may not be suitable for many cases since \(\varvec{H}\) may not be positive definite when \(\varvec{\theta }^i\) is remote from the solution. Moreover, convergence may not occur even when \(\varvec{H}\) is positive definite [11, 21]. To avoid the latter case, Newton’s method with line search can be implemented where the Newton correction is used to generate a search direction [11]. Following this approach (7) is modified as

$$\begin{aligned} \varvec{\theta }^{i+1} = \varvec{\theta }^{i} - \lambda ^i \left[ \varvec{H}(\varvec{\theta }^{i})\right] ^{-1} g(\varvec{\theta }^{i}) \end{aligned}$$
(17)

The Gauss–Newton solution can also be modified similarly. The advantages and disadvantages of Newton’s method and the Gauss–Newton method have been widely addressed in the literature [11]. An advantage of the Gauss–Newton method is that the second derivative matrix is approximated using the first-order derivatives. On the other hand, the Gauss–Newton method is equivalent to making a linear approximation of the residuals and hence the method is valuable either for the residuals or the degree of nonlinearity to be small [11]. While a detailed study of applicability of these two approaches is beyond the scope of this article, observations from extensive simulation results will be presented in the result section.

The optimization step follows a standard procedure. Using an initial guess of the parameters, the Jacobian and Hessian are evaluated and the parameters are iteratively updated until convergence. Next sections provide the analytical expressions for the matrices required to evaluate the Jacobian and the Hessian.

2.1.1 Class I Model

The parameter vector for the Class I model in (1) is given by \(\varvec{\theta }=[a\;b\;\delta \;\alpha ]^T\). So, the column elements of the Jacobian can be expressed in the Laplace domain as

$$\begin{aligned} \nabla E_k(s) = \left[ \begin{array}{cccc} \frac{b e^{-\delta s}}{(s^{\alpha } + a)^2} &{} \frac{-1e^{-\delta s}}{s^{\alpha } + a} &{} \frac{bse^{-\delta s}}{s^{\alpha } + a}&{} \frac{b s^{\alpha }e^{-\delta s}}{(s^{\alpha } + a)^2}\ln (s) \; \\ \end{array}\right] ^T U(s) \end{aligned}$$
(18)

An equivalent time domain expression is given by

$$\begin{aligned} \nabla e_k = \left[ \begin{array}{cccc} \frac{b}{(p^{\alpha } + a)^2} &{} \frac{-1}{p^{\alpha } + a} &{} \frac{bp}{p^{\alpha } + a}&{} \frac{b p^{\alpha }\ln (p)}{(p^{\alpha } + a)^2} \\ \end{array}\right] ^T u(t_k-\delta ) \end{aligned}$$
(19)

Here, \(\ln (s)\) and \(\ln (p)\) are logarithms of the derivative operator, expressed in the Laplace and time domain, respectively. The logarithmic operation is on the operator s or p. Similarly, the matrix \(\varvec{R_k}\) can be obtained in the time domain as

$$\begin{aligned} \varvec{R_k} = \left[ \begin{array}{cccc} \frac{-b}{(p^{\alpha } + a)^3} &{}\quad \frac{1}{(p^{\alpha } + a)^2} &{}\quad \frac{-bp}{(p^{\alpha } + a)^2}&{} \quad \frac{-b p^{\alpha }\ln (p)}{(p^{\alpha } + a)^3}\\ \frac{1}{(p^{\alpha } + a)^2} &{}\quad 0 &{}\quad \frac{p}{(p^{\alpha } + a)}&{}\quad \frac{ p^{\alpha }\ln (p)}{(p^{\alpha } + a)^2}\\ \frac{-bp}{(p^{\alpha } + a)^2} &{}\quad \frac{p}{(p^{\alpha } + a)} &{}\quad \frac{-bp^2}{(p^{\alpha } + a)}&{} \quad \frac{-b p^{\alpha +1}\ln (p)}{(p^{\alpha } + a)^2}\\ \frac{-b p^{\alpha }\ln (p)}{(p^{\alpha } + a)^3} &{}\quad \frac{ p^{\alpha }\ln (p)}{(p^{\alpha } + a)^2} &{}\quad \frac{-b p^{\alpha +1}\ln (p)}{(p^{\alpha } + a)^2}&{}\quad \frac{-b p^{\alpha }(p^{\alpha }-a) \ln (p)\ln (p)}{(p^{\alpha } + a)^3}\\ \end{array}\right] u(t_k-\delta )\qquad \end{aligned}$$
(20)

2.1.2 Class II Model

For the Class II model as in (2), the parameter vector \(\varvec{\theta }=[a_1\;a_0\;b\; \delta \;\alpha _2\;\alpha _1]^T\). Denoting \(D(p)=p^{\alpha _2}+a_1 p^{\alpha _1}+a_0\), the expressions for the elements of \(\varvec{A}\) and \(\varvec{R_k}\) for the Class II model can be given as

$$\begin{aligned} \nabla e_k= & {} \left[ \begin{array}{cccccc} \frac{bp^{\alpha _1}}{[D(p)]^2} &{} \frac{b}{[D(p)]^2} &{} \frac{-1}{D(p)} &{} \frac{bp}{D(p)} &{} \frac{b p^{\alpha _2}\ln (p)}{[D(p)]^2}&{} \frac{b a_1p^{\alpha _1}\ln (p)}{[D(p)]^2}\\ \end{array}\right] ^T u(t_k-\delta ) \end{aligned}$$
(21)
$$\begin{aligned} \varvec{R_k}= & {} \left[ \begin{array}{cccc} \frac{-2bp^{2\alpha _1}}{[D(p)]^3} &{} \frac{-2bp^{\alpha _1}}{[D(p)]^3} &{} \frac{p^{\alpha _1}}{[D(p)]^2} &{} \frac{-bp^{\alpha _1+1}}{[D(p)]^2} \\ \frac{-2bp^{\alpha _1}}{[D(p)]^3} &{} \frac{-2b}{[D(p)]^3} &{} \frac{1}{[D(p)]^2} &{} \frac{-bp}{[D(p)]^2} \\ \frac{p^{\alpha _1}}{[D(p)]^2}&{} \frac{1}{[D(p)]^2} &{} 0 &{} \frac{p}{D(p)} \\ \frac{-bp^{\alpha _1+1}}{[D(p)]^2} &{} \frac{-bp}{[D(p)]^2} &{} \frac{p}{D(p)} &{} \frac{-bp^2}{D(p)} \\ \frac{-2bp^{\alpha _2+\alpha _1}\ln (p)}{[D(p)]^3} &{} \frac{-2b p^{\alpha _2}\ln (p)}{[D(p)]^3} &{} \frac{p^{\alpha _2}\ln (p)}{[D(p)]^2} &{} \frac{-bp^{\alpha _2+1}\ln (p)}{[D(p)]^2} \\ \frac{b p^{\alpha _1}(p^{\alpha _2}-a_1 p^{\alpha _1}+a_0)\ln (p)}{[D(p)]^3} &{} \frac{-2b a_1p^{\alpha _1}\ln (p)}{[D(p)]^3} &{} \frac{a_1p^{\alpha _1}\ln (p)}{[D(p)]^2} &{} \frac{b a_1p^{\alpha _1}\ln (p)}{[D(p)]^2} \\ \end{array} \right. \nonumber \\&\left. \begin{array}{cc} \frac{-2bp^{\alpha _2+\alpha _1}\ln (p)}{[D(p)]^3} &{} \frac{b p^{\alpha _1}(p^{\alpha _2}-a_1 p^{\alpha _1}+a_0)\ln (p)}{[D(p)]^3}\\ \frac{-2b p^{\alpha _2}\ln (p)}{[D(p)]^3}&{} \frac{-2b a_1p^{\alpha _1}\ln (p)}{[D(p)]^3}\\ \frac{p^{\alpha _2}\ln (p)}{[D(p)]^2} &{} \frac{a_1p^{\alpha _1}\ln (p)}{[D(p)]^2}\\ \frac{b p^{\alpha _2}\ln (p)}{[D(p)]^2}&{} \frac{b a_1p^{\alpha _1}\ln (p)}{[D(p)]^2}\\ \frac{b p^{\alpha _2}(-p^{\alpha _2}+a_1 p^{\alpha _1}+a_0)\ln (p)\ln (p)}{[D(p)]^3}&{} \frac{-2b a_1p^{\alpha _2+\alpha _1}\ln (p)\ln (p)}{[D(p)]^3}\\ \frac{-2b a_1p^{\alpha _2+\alpha _1}\ln (p)\ln (p)}{[D(p)]^3}&{} \frac{b a_1p^{\alpha _1}(p^{\alpha _2}-a_1 p^{\alpha _1}+a_0)\ln (p)\ln (p)}{[D(p)]^3}\\ \end{array}\right] u(t_k-\delta )\nonumber \\ \end{aligned}$$
(22)

Evaluation of \(\varvec{A}\) and \(\varvec{R_k}\) needs estimation of \(\ln (p)u(t_i)\) as well as \(\ln (p)\ln (p)u(t_i)\). Victor et al. [40] suggested numerical estimation of the Jacobian as logarithm of the derivative operator is not trivial to simulate. We use analytical expressions to evaluate the logarithmic derivatives of the input signal.

2.2 Evaluation of the Logarithmic Derivative

The above methodology is applicable irrespective of the input type. As this article is concerned with the step input, analytical expressions for logarithmic derivatives of the step, expressed as in (23), is derived.

$$\begin{aligned} u(t)=h \varOmega (T) \end{aligned}$$
(23)

where h is the size of the step input and \(\varOmega (T)\) is the unit step function defined as

$$\begin{aligned} \varOmega (T)= \left\{ \begin{array}{c} 0 \;for \;t<T \\ 1 \;for \;t\ge T \end{array} \right. \end{aligned}$$
(24)

The log derivative, \(\ln (p)\) of a constant, c, is expressed [4, 14] as

$$\begin{aligned} \ln (p) c = -c (\gamma + \ln t) \end{aligned}$$
(25)

where \(\gamma \) is the Euler–Mascheroni constant [27] given by

$$\begin{aligned} \gamma = \begin{array}{l} \lim \\ m\rightarrow \infty \end{array} \left( \sum _{q=1}^{m}\frac{1}{q} - \ln m\right) \end{aligned}$$
(26)

The numerical value for the Euler-Mascheroni constant can be approximated as \(\gamma \approx 0.57721\). Following the above expressions, \(\ln (p)\) of a step signal can be obtained as

$$\begin{aligned} \ln (p) h \varOmega (T) = -h (\gamma + \ln t) \varOmega (T) \end{aligned}$$
(27)

Also, estimation of \(\varvec{R_k}\) requires evaluation of \(\ln (p) \ln (p)\) which can be obtained from

$$\begin{aligned} \ln (p) \ln (p) h \varOmega (T) = \ln (p) [-h (\gamma + \ln t) \varOmega (T)] \end{aligned}$$
(28)

[4] derived the logarithmic derivative of the logarithmic function as

$$\begin{aligned} \ln (p) \ln t = -\zeta (2)- (\gamma + \ln t)\ln t \end{aligned}$$
(29)

where \(\zeta \) is the Riemann zeta function, also known as the Hurwitz function [27] given as

$$\begin{aligned} \zeta (v) = \begin{array}{l} \lim \\ m\rightarrow \infty \end{array}\sum _{q=1}^{m}\frac{1}{q^v} \end{aligned}$$
(30)

Following (30), \(\zeta (2)\), can be obtained as [13]

$$\begin{aligned} \zeta (2) = \frac{1}{1^2}+\frac{1}{2^2}+\frac{1}{3^2} +\cdots =\frac{\pi ^2}{6}\approx 1.645 \end{aligned}$$
(31)

Using (28) and (29), \(\ln (p) \ln (p)\) for the step input signal can be obtained as

$$\begin{aligned} \ln (p)\ln (p) h \varOmega (T) = h \left[ \zeta (2) +(\gamma + \ln t)^2 \right] \varOmega (T) \end{aligned}$$
(32)

2.3 Implementation Issues

2.3.1 Initialization

Initialization plays an important role and poses a significant challenge for optimization schemes. In the proposed methodology, initialization of model orders, coefficients and the delay are required. For orders, we propose to initiate the optimization algorithm assuming integer values. For example, for a Class I model an order of 1 is used as the initial guess. For Class II models, a second-order model is used as the initial guess. Regarding the coefficients, we propose to estimate an integer order model using conventional identification method. In this article, the integral equation approach [9] is used to estimate the initial coefficients assuming a model without delay. The estimation procedure is detailed below. For the time delay, the initial guess can be obtained from the step response. We suggest to use a small initial value for the delay.

To describe the integral equation approach let us take an example differential equation representing the input–output relation of a process.

$$\begin{aligned} y(t) = \frac{\nu }{p^2+\mu _1 p + \mu _0} u(t) + \varepsilon _1(t) \end{aligned}$$
(33)

Here, \(\nu \) is the numerator coefficient and \(\mu _1\) and \(\mu _0\) are the denominator coefficients in the second-order model. The relation can be presented in the equation error form as

$$\begin{aligned} \frac{\mathrm{{d}}^2y(t)}{\mathrm{{d}}t^2} + \mu _1 \frac{\mathrm{{d}}y(t)}{\mathrm{{d}}t} + \mu _0 y(t) = \nu u(t) + \varepsilon _2(t) \end{aligned}$$
(34)

The equation is then integrated to get

$$\begin{aligned} y(t) + \mu _1 y^{[1]}(t) + \mu _0 y^{[2]}(t)= \nu u^{[2]}(t) + \varepsilon (t) \end{aligned}$$
(35)

where for any variable, y(t), \(y^{[j]}(t)\) is its j-th order integral for time limit 0 to t. The estimation equation (35) can be reformulated to get it in a least-squares solution form

$$\begin{aligned} \psi (t) = \varvec{\phi }^T(t) \varvec{\vartheta } + \varepsilon (t) \end{aligned}$$
(36)

where

$$\begin{aligned} \psi (t) = y(t), \;\; \varvec{\phi }^T(t) = \left[ \begin{array}{ccc} - y^{[1]}(t) &{} - y^{[2]}(t) &{} u^{[2]}(t) \\ \end{array} \right] , \;\; \varvec{\vartheta } = \left[ \begin{array}{ccc} \mu _1&\mu _0&\nu \end{array} \right] ^T \end{aligned}$$

Equation (36) can be written for \(t = t_{1}, t_{2} \ldots t_N\) and combined to give the estimation equation

$$\begin{aligned} \varPsi = \varPhi \varvec{\vartheta } + \epsilon \end{aligned}$$
(37)

with

$$\begin{aligned} \varPsi (t) = \left[ \begin{array}{c} \psi (t_{1}) \\ \psi (t_{2}) \\ \ldots \\ \psi (t_{N}) \\ \end{array}\right] , \;\; \varPhi (t) = \left[ \begin{array}{c} \varvec{\phi }^T(t_{1}) \\ \varvec{\phi }^T(t_{2}) \\ \ldots \\ \varvec{\phi }^T(t_{N}) \\ \end{array}\right] \end{aligned}$$
(38)

The parameter vector \(\varvec{\vartheta }\) is then obtained as the solution of the least-squares equation as

$$\begin{aligned} \varvec{\vartheta } = (\varPhi ^T \varPhi )^{-1} \varPhi ^T \varPsi \end{aligned}$$
(39)

2.4 Overall Algorithm

The overall methodology is summarized as Algorithm 1 taking the Class II model as an example. The same algorithm is applicable for model with other structures when appropriate equations are used.

figure a

3 Simulation Results

3.1 Simulation Environment

For the general fractional differentials and integrals of a function \(\omega (t)\), the Grunwald-Letnikov (GL) definition (41) is commonly used; see, e.g., [28].

$$\begin{aligned} _{t_0}D_{t}^{\rho } \omega (t)= \begin{array}{c} lim \\ \eta \rightarrow 0 \end{array} \frac{1}{\eta ^\rho } \sum _{j=0}^{\left| \frac{t-t_0}{\eta }\right| } (-1)^j \left( \begin{array}{c} \rho \\ j \end{array}\right) \omega (t-j\eta ) \end{aligned}$$
(41)

Here, \(t_0\) and t are the limits of the operator, \(\eta \) is the step size and \(\rho \) is the order with \(\rho >0\) means a derivative operation and \(\rho <0\) means integral operation. Also |.| means the integer part and

$$\begin{aligned} \left( \begin{array}{c} \rho \\ j \end{array} \right) =\frac{\varGamma (\rho +1)}{\varGamma (j+1)\varGamma (\rho -j+1)} \end{aligned}$$
(42)

with \(\varGamma (.)\) being the Euler’s Gamma function.

For numerical computation, a revised version of (41), presented in [6] is used where

$$\begin{aligned} _{t_0}D_{t}^{\rho } \omega (t)= \begin{array}{c} lim \\ \eta \rightarrow 0 \end{array} \frac{1}{h^{\rho }} \sum _{j=0}^{\left| \frac{t-t_0}{\eta }\right| } w_j^{(\rho )}\omega (t-j\eta ) \end{aligned}$$
(43)

where \(w_j^{(\rho )}\) can be evaluated recursively from

$$\begin{aligned} w_0^{(\rho )}= & {} 1 \end{aligned}$$
(44)
$$\begin{aligned} w_j^{(\rho )}= & {} \left( 1-\frac{\rho + 1}{j}\right) w_{j-1}^{(\rho )} \qquad j=1,2,\cdots \end{aligned}$$
(45)

In this work, MATLAB is used to perform the required numerical calculations. The step input is considered to be noise free. The step responses generated for corresponding fractional order models are corrupted with white Gaussian noise. Monte Carlo simulations (MCS) are performed by changing the ‘seed’ for the noise signal. The noise-to-signal ratio (NSR) is defined as the ratio of the variance of the noise to that of the signal. The results presented in this section are from 100 MCS. The presented estimated parameters are the mean of 100 estimates. Parameters are presented along with their standard deviations. The model structures are assumed to be known.

3.2 Convergence of Iteration Schemes

The previous section outlines both the Newton’s algorithm and the Gauss–Newton algorithm. Convergence issues with both of the approaches have been discussed in the optimization literature [7, 11]. In this section, a set of simulation results are presented to show the convergence of the two schemes. The following Class I model is used for this purpose.

$$\begin{aligned} G(s) = \frac{1.2 e^{-7s}}{20 s^{\alpha } + 1} \end{aligned}$$
(46)

A set of models having different values of \(\alpha \) are used. For model (46), the parameter vector is identified as \(\varvec{\theta }^T=[a\;b\;\delta \;\alpha ]=[0.05\;0.06\;7\;\alpha ]\). In the above format \(G(s)=\frac{K}{\tau s^{\alpha } + 1}\), the parameter vector is presented as \([\tau \;K\;\delta \;\alpha ] =[20\;1.2\;7\;\alpha ]\). Although the coefficients are identified as a, b, we present those as \(\tau \) and K for the sake of numerical convenience and clarity in interpretation. A total of 1000 data points are used in each cases with a sampling interval 0.1. A unit step signal is used. The order is initialized with a value of 1. An integer order model is identified using the integral equation approach, and the estimated values are used for initialization. The delay is initialized with a value of 1. \(\lambda \) is set at 0.5.

Fig. 1
figure 1

Convergence of parameters and error for the Newton’s algorithm (left) and Gauss–Newton algorithm (right)

Figure 1 shows the trajectories of all parameters and the error from initialization to convergence. The Newton’s algorithm was found to show smooth trajectories for all the parameters. On the other hand, the Gauss–Newton algorithm required less iterations for responses without overshoots. For the responses with overshoots (\(\alpha >1\)) both algorithms required almost the same number of steps.

The above results are obtained using noise free data. To demonstrate the performance of the two schemes for noisy data, results are shown in Table 1 for data with NSR \(10\%\). Mean values of 100 MCS are presented along with corresponding standard deviations for model (46) with \(\alpha =1.7\).

These results show that both of the algorithms perform comparably in terms of quality of the estimates. The convergence rates are \(100\%\) for both cases with an initial guess of 1 for the delay. The convergence rate is defined as the \(\%\) of times out of 100 MCS for which the algorithm converged. To compare the performance of the two algorithms in terms of convergence rate, results are presented in Fig. 2 for model (46) with \(\alpha =1.7\). This model represents a step response with overshoot. As seen from the figure, the Newton’s algorithm showed high convergence rate of \(100\%\) or close to it for a wide range of initial guess of delay from 0.1 to 13. On the other hand, for the Gauss–Newton approach, the convergence rate was close to \(100\%\) only for the range of initial delay from 1 to 4. Similar data are generated for a model with \(\alpha =0.7\), which represents a step response without any overshoot; for this case, the convergence rates are at or near \(100\%\) for a range of initial delay from 1 to 12 for both algorithms. However, the required number of iterations for the Newton’s algorithm is twice as many as that of the Gauss–Newton algorithm.

Table 1 Performance of Newton’s algorithm and Gauss–Newton algorithm
Fig. 2
figure 2

Convergence rate for the Newton’s and Gauss–Newton algorithm with different initial delay for model (46) with \(\alpha =1.7\)

Based on extensive simulation results, the following remarks can be made

  • With a suitable initial guess of the delay and the initial assumption of an integer order model, both of the schemes are able to provide satisfactory estimates of the parameters.

  • For responses with overshoots, the Newton’s algorithm converges for a wider range of initial guess of the delay compared to the Gauss–Newton scheme.

  • For responses without overshoots, the Gauss–Newton algorithm requires less number of iterations.

Based on these observations, the Gauss–Newton method is used for the following studies.

3.3 Effect of Data Quality

Two important quality measures for sampled data, namely the length of data and the noise-to-signal ratio are considered to study the effects of data on parameter estimates.

Table 2 Effect of data length and sampling interval on parameter estimates

Table 2 shows the identified parameters and their standard deviations for different data lengths. For all these cases, the NSR is \(10\%\). The results show that satisfactory estimates are obtained for data length as low as 100 for the Class I model. As expected, the results show better consistency for larger data sets. Also the required numbers of iteration decreased slightly with the increase in data length.

Figure 3 shows the effect of noise on parameter estimates. A total of 1000 data points are used for all cases. The results show satisfactory performance of the algorithm in terms of mean and standard deviation for NSR as high as \(50\%\).

Fig. 3
figure 3

Effect of noise on parameter estimates for the model (46) with \(\alpha =1.4\)

3.4 Identification of Different Class I Models

A list of Class I models with different orders is considered in this section. The parameters presented are the means of 100 MCS with the corresponding standard deviations in the parentheses. NSRs for each cases are \(10\%\) and 1000 data points are used. The convergence rates for each of the cases are \(100\%\). The number of iterations varies between 11 and 18 with less iterations required for higher values of \(\alpha \). Initial guess of the time delay is 1 for all cases and an integer order of 1 is used as initial guess. Initial guess of coefficients are obtained assuming an integer order model. The mean values of the estimated parameters match quite well with the corresponding true values; also the standard deviations are comparable for a NSR of \(10\%\).(Table 3)

Table 3 Identification results for different Class I models

3.5 Identification of Class II Models

Table 4 shows the identification results for a set of Class II models. Models with both integer and fractional orders are considered. For each case, 2000 data points are used and data are corrupted with a noise having NSR \(10\%\). An initial guess of 1 is used for the delay and integer orders \([\alpha _2\;\alpha _1]=[2\;1]\) is used for initialization. Convergence rates are \(100\%\) for all cases. The obtained results are quite satisfactory in terms of the mean and standard deviations.

Table 4 Class II model identification results

4 Concluding Remarks

The step has not been used as input for simultaneous estimation of orders, coefficients and the delay for fractional order models. Also none of the optimization methods for fractional order identification estimates the logarithmic derivatives of input signals which are required to evaluate the Jacobian and the Hessian. This article provides analytical expressions for logarithmic derivatives of the step input and those for the Jacobian and the Hessian; the resulting optimization scheme does not need to resort to numerical approximations. A simplified initialization procedure is also presented which, in fact, requires a choice of the delay only; however, convergence for a wide range of initial guess of the delay is observed. The orders are initially assumed to be integers. Initial value of the coefficients is obtained assuming an integral order model. Simulation results show the robustness of the optimization scheme in the presence of noise. Satisfactory results are obtained for data length as low as 100 and for NSR as high as \(50\%\) for different models. The required number of iteration steps is in the range 10–20. Quality of parameter estimates and low number of iteration required show the performance and efficiency of the algorithm.