1 Introduction

Fractional-order calculus is an extension of ordinary differentiation and integration to arbitrary order [1]. In recent decades, it has been widely used to describe a complex system more concisely and precisely such as viscoelastic structures [2], heat conduction [3]. It has also attracted increasing attention from the signal processing and systems control community. A tremendous amount of valuable results on system identification [4,5,6], controllability and observability [7, 8], controller synthesis [9, 10] of fractional-order systems (FOSs) have been reported.

Outliers occur frequently and cause serious consequences in practice such as signal processing [11, 12], mechanical devices [13]. When it comes to system identification, outliers caused by sensor malfunctions and data transmission errors will lead to worse estimation performance. Thus, they should be eliminated from the observed signal. So far, there have been some approaches for outliers detection, such as Hampel filter and robust regression methods. However, the Hampel filter behaves badly with coarsely quantized data [14]. Robust regression methods are inherently less sensitive to outliers, such as the least absolute deviations (LAD) [15]. Recently, Wright et al. put forward robust principal component analysis (RPCA) [16], which decomposed a corrupted matrix into a low-rank matrix and a sparse matrix, and then solved a semi-definite programming problem [17]. However, their algorithms are only for integer systems and need to filter measurement noise before detecting outliers. As we all know, filter is bound to reduce the values of outliers, leading to bad detection result. Unlike above approaches, we formulate the outliers detection problem as a matrix decomposition problem with the aid of a new nuclear norm method, which can detect outliers accurately and estimate noise concurrently. Besides, different from [18, 19], it does not need to construct a noise model in advance. Generally, it is not easy to describe measurement noise using a concrete model. After eliminating the detected outliers and estimated measurement noise from the output signal, we obtain recovered “clean” data for system identification.

Up to now, there have been several kinds of methods for FOSs identification in continuous- and discrete-time domains. Liao et al. [20] studied a subspace algorithm via the Poisson filter. Victor et al. [5] proposed an optimal instrumental variable method for estimating transfer function coefficients. In this paper, we study the parameter estimation problem from a new angle. It is well known that the gradient method is simple and convenient for parametrization model estimation and has attracted wide interest, which is firstly generalized to a fractional one named fractional-order gradient method (FOGM) [21]. Of course, there are other forms and applications of fractional gradient, such as fractional generalization of gradient systems using Riemann–Liouville derivative [22], fractional vector calculus by Gr\({\ddot{\mathrm{{u}}}}\)nwald–Letnikov definition [23], while the Caputo’s derivative and its difference are utilized generally in the literature. Though the FOGM algorithm is able to achieve faster convergence speed, it cannot converge to the extreme point [24]. In order to solve this problem, we modify the calculus’s low terminal of FOGM. In addition, it is proved that fractional-order parameter update law can obtain better steady accuracy. Wei et al. extended the conventional gradient estimator to the fractional-order field in continuous case [25, 26]. In the discrete case, the LMS algorithm is extended to fractional-order case, fractional-order parameter update law, by means of replacing the first-order difference with a fractional-order \(0<\alpha <1\) [27]. It is unfortunate that the contradiction between estimation accuracy and convergence speed cannot be coordinated taking advantage of both of them. To solve this problem, a novel fractional-order update gradient method (FOUGM) is proposed based on them.

Motivated by above discussions, this paper aims at designing an effective approach to estimate parameters of FOSs when the output signal is disturbed by Gauss white noise and outliers. A novel outliers detection approach is developed via nuclear norm and infinite norm for detecting outliers and estimating noise simultaneously. The modified FOGM with variable initial value mechanism can reduce the effect of non-locality of fractional calculus. With the help of fractional-order parameter update law and the modified FOGM, the proposed algorithm is developed step by step, which significantly improves parameter estimation performance.

After recalling the definitions of widely used fractional-order calculus and a necessary lemma in Sect. 2, Section 3 describes the problem of identifying fractional-order linear system and proposes a novel outliers detection approach. Then the fractional-order update gradient method is presented. Meanwhile, its convergence is also analyzed. In Sect. 4, some numerical simulations are provided to illustrate the validity and superiority of the proposed approach. Conclusions are drawn in Sect. 5.

2 Preliminaries

2.1 Continuous fractional-order calculus

Fractional-order calculus is a generalization and unification of the classical integer-order calculus. In view of its good properties on fractional-order derivative of constant and the initial value of Laplace transformation, the following Caputo’s derivative is adopted in this study

$$\begin{aligned} {\textstyle {{}_{t_0}{{\mathscr {D}}}_t^\alpha f \left( t \right) = \frac{1}{\varGamma \left( {m - \alpha } \right) } \int _{t_0}^t {\frac{{{f^{\left( m \right) }}\left( \tau \right) }}{{{{\left( {t - \tau } \right) }^{\alpha - m + 1}}}}\mathrm{{d}}\tau }, }} \end{aligned}$$
(1)

where \(m-1< \alpha <m, m\in {\mathbb {N}}_+\), \(\varGamma \left( \alpha \right) = \int _0^\infty {{x^{\alpha - 1}}{\mathrm{e}^{ - x}}\mathrm{{d}}x}\) is the Gamma function, and f(t) is a smooth function. To simplify the notation, the fractional derivative of order \(\alpha \) with the lower terminal at 0 can be denoted as \({{\mathscr {D}}}^\alpha \) instead of \({}_0{{\mathscr {D}}}_t^\alpha \).

Alternatively, when \(0<\alpha <1\), (1) can be rewritten in a form similar as the conventional Taylor series

$$\begin{aligned} {\textstyle { {}_{t_0}{{\mathscr {D}}}_t^\alpha f \left( t \right) =\sum \limits _{i = 0}^{\infty } \frac{f^{(i+1)}(t_0)}{\varGamma \left( {i+2 - \alpha } \right) } (t-t_0)^{i+1-\alpha }. }} \end{aligned}$$
(2)

2.2 Discrete fractional-order calculus

Several basic definitions and relevant properties of discrete fractional-order calculus will be presented in this section.

The \(\alpha \hbox {th}\) fractional-order sum of a discrete sequence \(f(n),~n\in {\mathbb {N}}_0\), is defined as

$$\begin{aligned} {\textstyle { { \nabla ^{-\alpha } }f\left( n \right) = \sum \limits _{j = 0}^{n} {{{\left( { - 1} \right) }^{j}} \left( {\begin{matrix}-\alpha \\ j \end{matrix}}\right) f\left( {n - j} \right) },}} \end{aligned}$$
(3)

where \(\alpha >0\) and \(\left( {\begin{matrix} p \\ q \end{matrix}}\right) = \frac{{\varGamma \left( {p+1 } \right) }}{{\varGamma \left( {q+1} \right) \varGamma \left( {p-q+1} \right) }}\).

The conventional \(m\hbox {th}\)-order difference is given by

$$\begin{aligned} {\textstyle { { \nabla ^m }f\left( n \right) = \sum \limits _{j = 0}^{m} {{{\left( { - 1} \right) }^{j}} \left( {\begin{matrix} m \\ j \end{matrix}}\right) f\left( {n - j} \right) }. }} \end{aligned}$$
(4)

Then the Caputo fractional difference can be defined as

$$\begin{aligned} {}^\mathrm{C}{ \nabla ^{\alpha } }f\left( n \right) = {\nabla ^{\alpha -m}}{\nabla ^{m}}f\left( n \right) , \end{aligned}$$
(5)

with \(m-1<\alpha <m\).

The discrete Mittag-Leffler function can be expressed as [28]

$$\begin{aligned} {\textstyle { {\mathscr {F}}_{\alpha ,\beta }\left( \lambda ,n \right) =\sum \limits _{j=0}^{\infty }{ {\lambda ^j} {\frac{\varGamma \left( j\alpha +\beta +n-1 \right) }{\varGamma \left( j\alpha +\beta \right) \varGamma \left( n \right) }}}. }} \end{aligned}$$
(6)

Before moving on, the following lemma is necessary.

Lemma 1

(See [29]) If \(m-1<\alpha <m\), then the solution to the fractional difference system

$$\begin{aligned} \nabla ^\alpha f\left( n\right) ={\lambda f\left( n\right) }+u\left( n\right) ,\lambda \ne 1 , \end{aligned}$$
(7)

with initial conditions \(\nabla ^k f(n)|_{n=0}=b_k, k=0,1,\ldots , m-1\), is unique and is given by

$$\begin{aligned} f\left( n\right)= & {} \sum \limits _{k=0}^{m-1}b_k {\mathscr {F}}_{\alpha ,k+1} \left( \lambda ,n\right) \nonumber \\&+\sum \limits _{\tau =1}^n {\mathscr {F}}_{\alpha ,\alpha } \left( \lambda ,\tau \right) u\left( n-\tau +1\right) . \end{aligned}$$
(8)

The system (7) has the following properties with \(u(n)=0\).

  1. (i)

    If \(\lambda >0\) and \(\lambda \ne 1\), f(n) is divergent.

  2. (ii)

    If \(\lambda =0, f\left( n\right) =\sum \nolimits _{k=0}^{m-1} {\frac{b_k}{k!}} {\frac{\varGamma \left( k+n \right) }{\varGamma \left( n\right) }}\), therefore f(n) may be divergent or constant.

  3. (iii)

    If \(\lambda <0, f(n)\) is convergent. More specifically, if \(0<\alpha \le 1, f(n)\) is monotone convergent for large n. If \(1<\alpha <2, f(n)\) is convergent with or without overshoot.

According to Lemma 1, when \(0<\alpha \le 1\), \(u(n)=0\), only \({\mathscr {F}}_{\alpha ,1} \left( \lambda ,n\right) \) needs to be considered. If \(\lambda <0\) and \(n\rightarrow \infty \), the asymptotic line for \({\mathscr {F}}_{\alpha ,1} \left( \lambda ,n\right) \) is [29, 30]

$$\begin{aligned} {\mathscr {F}}_{\alpha ,1} \left( \lambda , n\right)= & {} \sum \limits _{j=0}^{\infty }{ {\lambda ^j} {\frac{\varGamma \left( j\alpha +n \right) }{\varGamma \left( j\alpha +1 \right) \varGamma \left( n \right) }}} \nonumber \\&\sim -{\frac{1}{\lambda (n-1)^{\alpha } \varGamma (1-\alpha )}}. \end{aligned}$$
(9)

Under this situation, (8) turns out to be

$$\begin{aligned} {\textstyle { f(n)= b_0 {\mathscr {F}}_{\alpha ,1} \left( \lambda , n\right) \sim -{\frac{b_0}{\lambda (n-1)^{\alpha } \varGamma (1-\alpha )}}. }} \end{aligned}$$
(10)

Moreover, the bigger the \(|\lambda |\) or \(\alpha \) is, the faster the convergence speed of f(n) is.

3 Parameter estimation of fractional-order linear systems

3.1 Problem description

Consider the fractional-order single-input and single-output plant with the differential equation

$$\begin{aligned} {\textstyle { \sum \limits _{i=0}^{n} a_i {{\mathscr {D}}^{i\alpha } x\left( t\right) }=\sum \limits _{j=0}^{m} b_j {{\mathscr {D}}^{j\alpha } u\left( t\right) }, }} \end{aligned}$$
(11)

where \(m,n\in {\mathbb {N}}\) and \(m<n\). u(t) and x(t) are the input and output signals, respectively. The commensurate-order \(\alpha \) is known and satisfies \(0<\alpha < 1\). \(a_i\) \((i=0,1,\ldots ,n-1)\) and \(b_j\) \( (j=0,1,\ldots ,m)\) are constants but unknown. Let y(t) be a noisy observation of x(t) on the interval of t. \(\varepsilon (t)\) is the Gauss white noise and z(t) consists of 0 and outliers,

$$\begin{aligned} y \left( t \right) =x\left( t \right) + \varepsilon \left( t \right) +z\left( t \right) . \end{aligned}$$
(12)

Then put (12) into system Eq. (11) and donate \(v(t)=\varepsilon \left( t \right) +z\left( t \right) \), rewriting the equation as

$$\begin{aligned} {\textstyle { \sum \limits _{i=0}^{n} a_i {{\mathscr {D}}^{i\alpha } y\left( t\right) }=\sum \limits _{j=0}^{m} b_j {{\mathscr {D}}^{j\alpha } u\left( t\right) } +\sum \limits _{i=0}^{n} a_i {{\mathscr {D}}^{i\alpha } v(t)}. }}\nonumber \\ \end{aligned}$$
(13)

Unlike the discrete-time case, the difficulty for direct identification method of continuous-time systems lies in inability to measure the differentiation of all kinds of orders of input and output signals, and casting approximate ways on differentiation items enlarges the impacts of noises.

In this paper, the Poisson moment functional method (PMF) [31] is chosen to solve this difficulty. The transfer function of PMF can be given as

$$\begin{aligned} {\mathscr {L}}[g_l(t)]=G_l(s)=\left( \frac{\eta }{s+\xi } \right) ^l, \end{aligned}$$
(14)

where s is the Laplace variable and \(g_l(t)\) is the \(l\hbox {th}\)-order Poisson pulse function. \(\xi \in {\mathbb {R}}^+ \) and \(\eta \in {\mathbb {R}}^+ \) are, respectively, the Poisson filter constant and gain. More detailed information about PMF can be seen in [6].

The essential idea of pre-filtering by Poisson filter is to find a way to transmit the time derivative in the fractional-order sense to a known and special function. Define the procedure of pre-filtering as follows [6]

$$\begin{aligned} P_{g_l(t)} [{\mathscr {D}}^{\alpha }y(t)]&=g_l(t)*{\mathscr {D}}^{\alpha }y(t)\nonumber \\&={\mathscr {L}}^{-1}[G_l(s) s^{\alpha } Y(s)]\nonumber \\&={\mathscr {D}}^{\alpha }g_l(t)*y(t), \end{aligned}$$
(15)

where \(*\) represents the convolution product operator. Then performing this operation on Eq. (13), one can obtain

$$\begin{aligned} \sum \limits _{i=0}^{n} a_i P_{g_l(t)} [{{\mathscr {D}}^{i\alpha } y(t)}]=&\sum \limits _{j=0}^{m} b_j P_{g_l(t)} [{{\mathscr {D}}^{j\alpha } u\left( t\right) }] \nonumber \\&+\sum \limits _{i=0}^{n} a_i P_{g_l(t)}[{{\mathscr {D}}^{i\alpha } v(t)}]. \end{aligned}$$
(16)

As the concept of continuous-time noise or outliers is not a trivial extension of the discrete-time case [32], therefore, the data are collected with a sampling period which is so small that the approximate errors are negligible during the numerical computation of fractional derivatives. Moreover, the number of samples is assumed to be large enough to guarantee the convergence of the estimated parameters to the true ones. Prescribe \(y(k)=y(kT), u(k)=u(kT), v(k)=v(kT)\), and \(y_i(k)=P_{g_l(k)} [{{\mathscr {D}}^{i\alpha } y(k)}], u_j(k)=P_{g_l(k)} [{\mathscr {D}}^{j\alpha } u(k)], V(k)= \sum \limits _{i=0}^{n} a_i P_{g_l(k)}[{{\mathscr {D}}^{i\alpha } v(k)}]\), \((k=1,2,\ldots ,N)\). Then, Eq. (16) can be rewritten as

$$\begin{aligned} {\textstyle { \sum \limits _{i=0}^{n} a_i y_i(k)=\sum \limits _{j=0}^{m} b_j u_j(k)+V(k). }} \end{aligned}$$
(17)

Generally, \(a_{n}=1\). Eq. (17) is expressed as

$$\begin{aligned} y_n(k)=\varphi _k^\mathrm{T} \theta +V(k), \end{aligned}$$
(18)

where \(\varphi _k=[-y_0(k), -y_1(k),\ldots ,-y_{n-1}(k),u_0(k),u_1(k), \ldots ,u_m(k)]^\mathrm{T}\) and \(\theta =[a_0, a_1,\ldots ,a_{n-1},b_0,b_1,\ldots ,b_m]^\mathrm{T}\). The least squares estimate for \(\theta \) is given by

$$\begin{aligned} {\textstyle { \hat{\theta }=\bigg [\sum _{k=1}^{N} \varphi _k \varphi _k^\mathrm{T}\bigg ]^{-1} \bigg [\sum _{k=1}^{N} \varphi _k y_n(k)\bigg ]. }} \end{aligned}$$
(19)

It is obvious that the least squares estimate from (19) will be biased even if only Gauss white noise exists let alone outliers. Because outliers in y(k) may cause big errors in parameter estimation, it should be eliminated from the observed signal.

3.2 Outliers detection and noise estimation

We formulate the outliers detection problem as a matrix decomposition problem with the help of nuclear norm and infinite norm. To our best knowledge, our method is the first one to detect outliers and estimate noise simultaneously.

Construct a Hankel form from y(k) as follows:

$$\begin{aligned} D= \left[ \begin{array}{ccccc} y(1) &{} y(2) &{} y(3) &{} \cdots &{} y(q)\\ y(2) &{} y(3) &{} y(4) &{} \cdots &{} y(q+1) \\ y(3) &{} y(4)&{} y(5) &{} \cdots &{} y(q+2)\\ \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots \\ y(p) &{} y(p+1) &{} y(p+2) &{} \cdots &{} y(N_D) \end{array} \right] , \end{aligned}$$
(20)

where \(p+q-1=N_D\), \(p\le q \le N\). Assume the system order is much smaller than the dimension of D. The problem of matrix decomposition is described as

$$\begin{aligned} D=L+S, \end{aligned}$$
(21)

where \( L= \left[ \begin{array}{cccc} x(1) &{} x(2) &{} \cdots &{} x(q)\\ x(2) &{} x(3) &{} \cdots &{} x(q+1) \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ x(p) &{} x(p+1) &{} \cdots &{} x(N_D) \end{array} \right] \) and \( \quad S= \left[ \begin{array}{cccc} v(1) &{} v(2) &{} \cdots &{} v(q)\\ v(2) &{} v(3) &{} \cdots &{} v(q+1) \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ v(p) &{} v(p+1) &{} \cdots &{} v(N_D) \end{array} \right] . \)

Our goal is to recover L from (21). In view of integer-order system, L is a low-rank matrix; thus, it can be achieved by solving a rank minimization problem [16] in which S can be described as a sparse matrix. However, here the L is not a low-rank matrix due to the non-locality of fractional calculus. We construct the following convex problem

$$\begin{aligned} \begin{aligned}&\mathop {\min }\limits _v \quad \parallel L \parallel _{2*} +r\parallel S\parallel _{\infty }, \\&\texttt {s}. \texttt {t}. \quad L+S=D, \end{aligned} \end{aligned}$$
(22)

where r is a tuning parameter.

Unlike general rank minimization problem, the rank is replaced by the nuclear norm. Since the fractional-order differential value of a function at some point mainly depends on recent function values and the effect of earlier function values can be ignored, y(k) is mainly related to recent data. Then the nuclear norm is still a valid criterion. However, because the measurement noise is inevitable in practice, S cannot be described as a sparse matrix, so we replace 0-norm with infinite norm for the first time, enlarging the range of practical application.

Remark 1

One may attempt to filter out the noise before detecting outliers, but the filter will also reduce the values of outliers, causing unsatisfying detection result. By the aid of infinite norm, not only can accurate detection be achieved, but also noise could be estimated to a certain degree at the same time. In addition, exact detection results will always be obtained no matter whether noise exists or not.

3.3 Parameter estimation algorithm

After eliminating the detected outliers and estimated noise, the recovered “clean” data are used to estimate parameters of systems. FOUGM is put forward with the analysis of convergence and effectiveness.

Similar to Eq. (18),

$$\begin{aligned} \bar{y}_n(k)= \bar{\varphi }_k^\mathrm{T} \theta +\epsilon (k), \end{aligned}$$
(23)

where \(\bar{\varphi }_k=[-\bar{y}_0(k), -\bar{y}_1(k),\ldots ,-\bar{y}_{n-1}(k),u_0(k),u_1(k), \ldots ,u_m(k)]^\mathrm{T}\). \(\bar{y}_i(k)=P_{g_l(k)} [{{\mathscr {D}}^{i\alpha } \bar{y}(k)}]\) with the recovered data \(\bar{y}(k)\) from measured output y(k), and \(\epsilon (k)=\sum \nolimits _{i=0}^{n} a_i P_{g_l(k)}[{{\mathscr {D}}^{i\alpha } \bar{\varepsilon } (k)}]\), \((k=1,2,\ldots ,N)\). It is worth mentioning that the recovered data \(\bar{y}(k)\) have a bit of noise, whereas the values of noise are small. Thanks to the proposed outliers detection method, all outliers can be detected accurately, so \(\bar{y}(k)\) does not contain z(k), which is very exciting.

$$\begin{aligned} \hat{y}_n(k)= \bar{\varphi }_k^\mathrm{T} \hat{\theta }_k, \end{aligned}$$
(24)

where \(\hat{\theta }_k\) and \(\hat{y}_n(k)\) are the estimations of \(\theta \) and \(\bar{y}_n(k)\), respectively.

Set the prediction error as

$$\begin{aligned} e(k)&=\bar{y}_n(k)-\hat{y}_n(k) \nonumber \\&= \bar{\varphi }_k^\mathrm{T}(\theta - \hat{\theta }_k) + \epsilon (k) \nonumber \\&=\bar{\varphi }_k^\mathrm{T} \tilde{\theta }_k+\epsilon (k), \end{aligned}$$
(25)

where \(\tilde{\theta }_k=(\theta - \hat{\theta }_k)\) is the parameter vector estimation error.

The objective function is chosen as

$$\begin{aligned} \begin{aligned} J ( \hat{\theta } )= \frac{1}{2}{e(k)}^2 = \frac{1}{2}[\bar{\varphi }_k^\mathrm{T}(\theta - \hat{\theta }_k)+\epsilon (k)]^2. \end{aligned} \end{aligned}$$
(26)

Then the identification of \(\theta \) in (26) is to find a \(\hat{\theta }_k\) that minimizes \(J(\hat{\theta })\). The FOGM can be used to derive the recursive equation of \(\hat{\theta }_k\), given as [21]

$$\begin{aligned} \hat{\theta }_{k+1} -\hat{\theta }_{k}= -\mu _{\hat{\theta } _0} {{\mathscr {D}}}_ {\hat{\theta }_k}^{\gamma } { J( \hat{\theta }) } , \end{aligned}$$
(27)

where \(\mu >0\) is the iteration step size, and \(0<\gamma <1\). However, it cannot converge to true extreme value [24]. To solve this problem, a modified counterpart is established with the aid of variable initial value mechanism, which is expressed as

$$\begin{aligned} \hat{\theta }_{k+1} -\hat{\theta }_{k}= -\mu _{\hat{\theta } _{k-1}} {{\mathscr {D}}}_ {\hat{\theta }_k}^{\gamma } { J( \hat{\theta }) }. \end{aligned}$$
(28)

On the basis of definition (2) and considering the fact that Taylor series cannot be implemented in practice, only the first item is reserved to reduce the complexity, obtaining

$$\begin{aligned} {}_{\hat{\theta } _{k-1}} {{\mathscr {D}}}_ {\hat{\theta }_k}^{\gamma } { J( \hat{\theta }) } = \frac{J^{(1)}({\hat{\theta } _{k-1}})}{\varGamma \left( {2 - \gamma } \right) } ({\hat{\theta }_k}-{\hat{\theta } _{k-1}})^{1-\gamma }. \end{aligned}$$
(29)

The step size of gradient descent method must be positive to guarantee that \(\hat{\theta }_k\) is corrected along the negative gradient direction until the extreme value of \(J(\hat{\theta })\) is obtained. So we replace \(({\hat{\theta }_k}-{\hat{\theta } _{k-1}})^{1-\gamma }\) with \( |{\hat{\theta }_k}-{\hat{\theta } _{k-1}}|^{1-\gamma }\). Here follows the formula

$$\begin{aligned} \begin{aligned} {}_{\hat{\theta } _{k-1}} {{\mathscr {D}}}_ {\hat{\theta }_k}^{\gamma } { J( \hat{\theta }) } =&\frac{J^{(1)}({\hat{\theta } _{k}})}{\varGamma \left( {2 -\gamma } \right) } |{\hat{\theta }_k}-{\hat{\theta } _{k-1}}|^{1-\gamma } \\ =&-\frac{|{\hat{\theta }_k}-{\hat{\theta } _{k-1}}|^{1-\gamma }}{\varGamma (2-\gamma )} \bar{\varphi }_k e(k) \\ =&- \frac{|{\hat{\theta }_k}-{\hat{\theta } _{k-1}}|^{1-\gamma }}{\varGamma (2-\gamma )} \bar{\varphi }_k [ \bar{\varphi }_k^\mathrm{T}(\theta - \hat{\theta }_k)\\&\quad +\epsilon (k)]. \end{aligned} \end{aligned}$$
(30)

One may find that \(J^{(1)}({\hat{\theta } _{k-1}})\) is replaced by \(J^{(1)}({\hat{\theta } _{k}})\). This is in order to guarantee the convergence of the algorithm, since \(\mathop {\lim }\limits _{k\rightarrow \infty } J^{(1)}({\hat{\theta } _{k}}) |{\hat{\theta }_k}-{\hat{\theta } _{k-1}}|^{1-\gamma } =0 \) indicates \(J^{(1)}({\hat{\theta } _{k}})=0\) for \(|{\hat{\theta }_k}-{\hat{\theta } _{k-1}}|^{1-\gamma } \ne 0\).

Instead of the first-order difference at the right-hand side of (28), a fractional-order parameter update law is adopted. Combining it with the modified FOGM, the FOUGM is proposed, described as

$$\begin{aligned} \nabla ^{\beta } \hat{\theta }_{k} = -\mu _{\hat{\theta } _{k-1}} {{\mathscr {D}}}_ {\hat{\theta }_k}^{\gamma } { J( \hat{\theta }) }, \end{aligned}$$
(31)

where \(0<\beta \le 2\).

According to definition (3), (4) and (5), it is easy to obtain

$$\begin{aligned} \begin{aligned} \nabla ^\beta \hat{\theta }_{k}&= \nabla ^{\beta -m} \hat{\theta }_{k} \nabla ^m \hat{\theta }_{k} \\&= \nabla ^m \hat{\theta }_{k} + \sum \limits _{j=1}^{k} \left( -1\right) ^j {\beta -m \atopwithdelims ()j} \nabla ^m \hat{\theta }_{k-j}\\&= \nabla ^m \hat{\theta }_{k}+ \sum \limits _{j=0}^{k-1} \left( -1\right) ^{j+1} {\beta -m \atopwithdelims ()j+1} \nabla ^m \hat{\theta }_{k-j-1}. \end{aligned} \end{aligned}$$
(32)

Considering \(g\left( k\right) =\left( -1\right) ^{k+1} { {\beta -m} \atopwithdelims (){k+1} } \) and \( \hat{\theta }_d \left( k \right) =g\left( k\right) * \nabla ^m \hat{\theta } \left( k \right) \), we get

$$\begin{aligned} \nabla ^\beta \hat{\theta }_{k}= \nabla ^{m} \hat{\theta }_{k} + \hat{\theta }_d \left( k-1\right) . \end{aligned}$$
(33)

Combining (30), (31) and (33), the designed algorithm can be divided into the following two cases.

  1. (i)

    If \(0<\beta \le 1\), \( m=1\).

    $$\begin{aligned} \hat{\theta }_{k+1}=&\hat{\theta }_k- \hat{\theta }_d \left( k-1\right) + \mu \frac{|{\hat{\theta }_k}-{\hat{\theta }_{k-1}}|^{1-\gamma }}{\varGamma (2-\gamma )} \bar{\varphi }_k e(k).\nonumber \\ \end{aligned}$$
    (34)
  2. (ii)

    If \(1<\beta \le 2\), \( m=2\).

    $$\begin{aligned} \begin{aligned} \hat{\theta }_{k+1}=&2 \hat{\theta }_k -\hat{\theta }_{k-1} - \hat{\theta }_d \left( k-1\right) \\&+ \mu \frac{|{\hat{\theta }_k}-{\hat{\theta } _{k-1}}|^{1-\gamma }}{\varGamma (2-\gamma )} \bar{\varphi }_k e(k). \end{aligned} \end{aligned}$$
    (35)

Remark 2

As can be seen from formula (34) where is a \(\hat{\theta }_d(k-1)\), opposite to integer-order update, it is the weighted sum of historical information of \(\hat{\theta }\) from \(\hat{\theta }_1\) to \(\hat{\theta }_{k-1}\), which is equivalent to making a smooth to parameter. Thus, it can be concluded that fractional-order update law has superiority compared with the integer one. To reduce repetitive work, only the case of \(0<\beta \le 1\) is considered hereinafter.

The whole process proposed above can be described as follows:

The algorithm of FOUGM for fractional-order systems with outliers

Step 1:

Solve the convex problem (22). Then eliminate the detected outliers and estimated noise from the output signal, obtaining recovered output \(\bar{y}(k)\)

Step 2:

Pre-filter the input and recovered output data through PMF method described by (15), achieving (24)

Step 3:

Set initial value of \(\hat{\theta }\), just not as zero. Set suitable update order \(\beta \), differential order \(\gamma \) and step size \(\mu \)

Step 4:

Compute \( \hat{\theta }\) by calculating (25) and (34) iteratively

3.4 Analysis of convergence

As mentioned above \(\tilde{\theta }_k=(\theta - \hat{\theta }_k)\), combine (30) and (31). Under Caputo fractional-order calculus definition, \( \nabla ^\beta \theta =0\); therefore,

$$\begin{aligned} \begin{aligned} \nabla ^\beta \tilde{\theta }_k=&-\nabla ^\beta \hat{\theta }_k \\ =&\mu _{\hat{\theta } _{k-1}} {{\mathscr {D}}}_ {\hat{\theta }_k}^{\gamma } { J( \hat{\theta }) } \\ =&-\mu \frac{|{\hat{\theta }_k}-{\hat{\theta } _{k-1}}|^{1-\gamma }}{\varGamma (2-\gamma )} \bar{\varphi }_k e(k)\\ =&-\rho _k \bar{\varphi }_k [\bar{\varphi }_k^\mathrm{T}\tilde{\theta }_k +\epsilon (k)] , \end{aligned} \end{aligned}$$
(36)

where \(\rho _k=\mu \frac{|{\hat{\theta }_k}-{\hat{\theta } _{k-1}}|^{1-\gamma }}{\varGamma (2-\gamma )} >0 \), which is equivalent to variable step size and related to differential order \(\gamma \).

Since the PMF filter is a linear transformation and \(\varepsilon (k)\) is Gauss white noise with zero mean, \(E[\epsilon (k)]=\sum \nolimits _{i=0}^{n} a_i P_{g_l(k)} E[{{\mathscr {D}}^{i\alpha } \bar{\varepsilon } (k)}] =0\) and \(\bar{\varphi }_k \) is not related to \(\epsilon (k)\) [6]. Take the expectation of (36) and diagonalize \(E[\bar{\varphi }_k \bar{\varphi }_k^\mathrm{T}]\) with unitary matrix Q, following

$$\begin{aligned} \nabla ^\beta E[\tilde{\theta }_k]=-\rho _k Q^{-1}\varLambda Q E[ \tilde{\theta }_k] , \end{aligned}$$
(37)

or equivalently,

$$\begin{aligned} \nabla ^\beta E[\tilde{\vartheta }_k]=-\rho _k \varLambda E[ \tilde{\vartheta }_k], \end{aligned}$$
(38)

where \(E[ \tilde{\vartheta } _k]= Q E[\tilde{\theta }_k]\), and \(\varLambda =\mathrm{{diag}} \{ \lambda _1, \lambda _2, \lambda _3, \ldots , \lambda _{n+m-1}\}\), \(\lambda _j>0, j=1,2,\ldots ,n+m-1\).

Considering (10), the solution to each component of the matrix Eq. (38) is given by

$$\begin{aligned} \begin{aligned} E[\tilde{\vartheta }_j(k)]&=q_j {\mathscr {F}}_{\beta ,1} \left( -\rho _k \lambda _j, k\right) \\&\sim {\frac{q_j}{\rho _k \lambda _j \left( k-1\right) ^{\beta } \varGamma \left( 1-\beta \right) }}, \end{aligned} \end{aligned}$$
(39)

where \(q_j=E[\tilde{\vartheta }_j \left( 0\right) ]\) is the initial condition for the \(j\hbox {th}\) estimated parameter.

According to Lemma 1, \(E[\tilde{\vartheta }_j \left( k\right) ] \) converges to 0 when \(k \rightarrow \infty \). Thus, \(E[\hat{\theta }_k]\) converges to \(\theta \) when k is large enough. Besides, as can be seen from (39), it is easy to get the conclusion that \(E[\tilde{\vartheta }_j \left( k\right) ]\) gets to 0 faster for bigger update order \(\beta \) and step size \(\rho _k\). Equivalently, \(E[\hat{\theta }_k ]\) converges to \(\theta \) faster with bigger \(\beta \) and \(\rho _k\).

Remark 3

The adoption of variable initial value mechanism reduces the effect of non-locality of fractional calculus, which enables the FOGM to converge. Thanks to fractional-order parameter update law, the convergence performance of the proposed algorithm is improved significantly. Besides, it can also be applied to filter and other applications.

4 Illustrative examples and analysis

In this section, the effectiveness and superiority of the proposed parameter estimation method are verified by numerical examples. In order to reduce the complexity of calculation, the infinite impulse response is used to approximate the fractional-order sum operator in these examples [33].

4.1 Detect outliers and estimate noise simultaneously

Consider the FOS

$$\begin{aligned}&{\mathscr {D}} ^\alpha x(t)+a x(t)=b u(t), \end{aligned}$$
(40)
$$\begin{aligned}&y \left( t \right) =x\left( t \right) + \varepsilon \left( t \right) +z\left( t \right) , \end{aligned}$$
(41)

where \(\varepsilon \left( t \right) \) is Gauss white noise with noise level 20 dB and z(t) consists of zeros and outliers. PRBS signal is chosen as the input with sampling time \(T=0.02\) s. Select \(\alpha =0.9\), \(a=0.5\), \(b=0.2\). The initial conditions of the FOS are chosen as \(x(t)=x(0)=0, -\infty \le t<0\). Because the input signal and Gauss white noise are stochastic, a Monte Carlo experiment is carried out about 20 times. y(k) are the sampling values of y(t).

An input signal and corresponding output signal are presented in Fig. 1.

Fig. 1
figure 1

a Input, noise-free output. b Measured output with white noise and outliers

Remark 4

The Gr\({\ddot{\mathrm{{u}}}}\)nwald–Letnikov definition is chosen for the numerical implement of this fractional-order system. Besides, other methods could also be acceptable [34].

Case 1 The output signal is polluted only by outliers

The optimization problem (22) is solved by CVX-Toolbox of MATLAB and simulated on a 3.0 GHz processor with 4 GB of memory. Let \(p=10,q=280\) and \(r=0.55\). In order to decrease the complexity and repeatability of computation, data segments of \(0\le t\le 5.6\) s, that is to say the number of samples is 280, are chosen. For the sake of comparing our method with that presented in [17] and Hampel filter, the corresponding output signal and detection results are shown in Fig. 2 and Table 1.

Fig. 2
figure 2

Comparison of detection results

Table 1 Comparison of outliers detected by [17] and our method

As shown in Fig. 2, three methods all can detect outliers accurately when measurement noise does not exist. However, the detection accuracy obtained by the proposed method is prior to that of [17], which is shown in Table 1.

On the other hand, the computation time of Hampel filter, the proposed method and [17] are 1.25, 33.29 and \(138.14\,s\), respectively. It is not difficult to interpret why the computation time of [17] is longer than ours. Because its \(p=q=\frac{N_D+1}{2}\), which denote the dimension of matrix D, here we select \(p=10, q=N_D+1-q\), making the size of D smaller. Thus, the complexity of calculation of the proposed method is between that of other methods. That is to say, the proposed method achieves a trade-off between detection accuracy and computation cost.

Case 2 The output signal is polluted by Gauss white noise and outliers

Since the observation y(t) is often corrupted by measurement noise in practice as well, it is worthwhile to study the problem of outliers detection when \(\varepsilon (t)\ne 0\), which is shown in Fig. 1. Note that the total number of sampling \(N=4800\). The output data are divided into ten gropes sequentially, and \(p=10,q=480\) and \(r=0.55\). Considering the big computation cost of approach presented in [17] and it needs to filter noise firstly, only the Hampel filter is utilized to compare with our method in following simulations.

Fig. 3
figure 3

a Output signal and outliers detected by Hampel filter b Estimated and real noise outliers and counterpart estimated by the proposed method

In Fig. 3a, the number of outliers detected by Hampel filter is 235, while the real number is 90. In other words, some noises are mistakenly determined as outliers by Hampel filter. As shown in Fig. 3b, all the outliers can be detected accurately by the proposed method. Meanwhile, the measurement noise can also be estimated though the estimated values are a bit smaller than the real values. Then the noise level is reduced, which is validated in Fig. 4. Compared with single outliers detection method or filter means, such as RPCA, Hampel filter and LAD, the proposed approach based on nuclear norm and infinite norm possesses effectiveness and superiority, which further enlarges the range of practical application. After eliminating the detected outliers and estimated noise from the output signal, we obtain the recovered output signal.

Fig. 4
figure 4

Recovered output

4.2 Estimate parameters using recovered output data

In following simulations, the recovered output signal is used to estimate parameters of the fractional-order system described by Eq. (40). The Poisson filter is selected as \(G_l(s)=\left( \frac{0.5}{s+0.5} \right) ^2\), in which \(\eta =\xi =0.5,l=2\), conforming to the conditions of bandwidth and order.

The relative error of parameters estimation is used as the evaluation criterion of the algorithm which is defined as

$$\begin{aligned} \delta _x=\frac{\parallel x-x_t \parallel }{\parallel x \parallel }, \end{aligned}$$
(42)

where \(x_t\) is estimation value of the real value x. In the following simulations, data segments of \(t\ge 30\) s are selected as steady state that are used to calculate the relative error \(\delta \).

Firstly, we consider the effect of differential order \(\gamma \) on estimation performance. In view of the fairness of comparison, let step size \(\mu =0.02\) when \(\beta =1,\gamma =1\) to achieve the best estimation results of conventional gradient method, achieving the relative error \(\delta _a=1.029\%, \delta _b=5.740\%\). Set \(\mu =0.25\) when the modified fractional-order gradient method is adopted. Simulation results are shown in Fig. 5.

Note that the convergence speed becomes faster as the differential order \(\gamma \) increases. When \(\gamma =0.7\), the relative error \(\delta _a=0.723\%, \delta _b=3.609\%\). Comparing cases of \(\gamma =0.7\) and \(\gamma =1\), the estimation accuracy obtained by the modified FOGM is better than that achieved by conventional gradient method.

Fig. 5
figure 5

The effect of \(\gamma \)

Fig. 6
figure 6

The effect of \(\beta \)

Secondly, we study the influence of update order \(\beta \) on estimation performance. Set the step size \(\mu =0.04\). As can be seen from simulation results presented in Fig. 6, the bigger the \(\beta \) is, the faster the convergence speed is. When \({\beta =0.9,\gamma =1}\), the relative error \(\delta _a=0.367\%, \delta _b=2.314\%\).

Comparing Fig. 5 with Fig. 6, the modified FOGM is able to achieve faster convergence, while a fractional-order parameter update law could obtain superior estimation accuracy. Then a natural idea is to combine them, getting FOUGM, for more satisfactory performance, which is illustrated by a sample simulation example shown in Fig. 7. Here \(\gamma =0.7,\beta =0.9\), \(\mu =0.04\).

Fig. 7
figure 7

Simulation results by FOUGM

The relative error \(\delta _a=0.481\%, \delta _b=2.743\%\). Clearly, faster convergence speed and superior estimation accuracy are gained at the same time. Thus, the contradiction between convergence speed and estimation accuracy is coordinated successfully by the proposed FOUGM.

5 Conclusions

In this paper, the FOUGM is proposed for FOSs when measured output is corrupted by Gauss white noise and outliers. Unlike general filter and outliers detection methods, a novel outliers detection approach is developed via nuclear norm and infinite norm. To our best knowledge, it is the first time to exactly detect outliers and estimate noise simultaneously. Then the FOUGM is designed based on a modified FOGM with variable initial value mechanism and fractional-order parameter update law, which can not only obtain faster convergence speed, but also achieve satisfactory estimation accuracy as can be seen from simulation results. Note that the outliers detection method and FOUGM can also be used for other applications and general systems identification, not just for FOSs.