1 Introduction

Time delay exists in many engineering, physics, chemistry, biology and economics systems. In control systems, time delay due to sensor and actuator dynamics, signal transmission, and digital computations is an important factor that influences the stability and control performance. To make matter worse, time delay is often unknown. Time delay estimation in a control system is a challenging problem. It is even more challenging when the system dynamics is nonlinear and unknown. This paper presents a nonparametric identification technique to identify nonlinear dynamic systems and estimate time delay introduced by the feedback control.

There have been many studies of time delay identification of control systems. Richard presented an overview of time delay estimation methods in [1]. The time delay estimation techniques based on pulsed inputs have been developed in [2, 3]. The Padé approximation [4], the modified least square and recursive methods [5,6,7,8,9] , instrumental variable identification [10], neural networks [11], algebraic estimation [12, 13], adaptive techniques [14, 15], and non-commutative rings [16] are just a few popular methods for time delay estimation.

The methods for time delay estimation can be in frequency or time domain [17, 18]. In this paper, our focus is on the approaches in time domain. Since time delay usually appears in the system implicitly, the methods for conventional parameter estimation of dynamic systems cannot be directly applied to estimate time delay. Time delay \(\tau \) usually appears in the exponential term \(e^{-\tau s}\) in the transfer function of the system. The expansion techniques can parameterize it, including the classical Padé approximation, the Laguerre Fourier series, the Kautz series, the second-order Padé shift and the diagonal Padé shift. The main concern with the rational approximation is the truncation error and stability complications. Even though higher order expansions can reduce the truncation error, the system can become unstable even when the system is linear with a constant time delay [1]. In this paper, we employ the Taylor expansion. Xu demonstrates that the low order Taylor expansion gives promising estimation of small time delay [19].

The work in [20] presents a nonlinear least square-based algorithm in which the instrumental variable method estimates the parameters of the transfer function of the system while an adaptive gradient-based iteration finds the optimal time delay from the filtered irregularly sampled data. The main problem with this algorithm is that the proposed cost function may have several local minima. Therefore, it highly depends on the initial guess for the parameters and especially the time delay. To deal with this issue, the authors use a low-pass filter to widen the convergence region around the global minimum. Another nonlinear recursive optimization algorithm is proposed in [21], which combines the linear method of Levenberg-Marquardt to compute the plant parameters with a modified Gauss–Newton algorithm to estimate time delays. A low-pass filter and a binary transformation are applied to the data corrupted by the white noise to create the regressor matrix for minimizing a quadratic cost function of the estimation error. The identification is online and is demonstrated on a MISO plant with multiple time delays. Similarly, the algorithm in [9] employs the Gauss–Newton method to estimate time delay when the simplified refined instrumental variable (SRIVC) method is used to find the plant parameters.

The Taylor expansion is used to parameterize the system with explicit time delay in [15]. An adaptive law for the parameter estimation is proposed such that the estimation error is converged. A recursive formula is introduced in [6] to improve the accuracy and convergency rate of another recursive algorithm for online estimation in [22]. The strategy in [8] aims at fractional time delay identification for discrete-time systems. It separates the influence of the system structure and the time delay by discretizing the system. With the help of the Kalman filter, the parameters are estimated recursively. The algorithm in [11] first parameterizes the system by a polynomial function and trains a neural network to estimate the parameters and time delay. A Schweizer and Wolff’s \(\sigma \) measure denoted as \(\sigma _{SW}\) from the copula theory is introduced to study the relationship of input-output signals for SISO systems [23]. It is found that the measure reaches its maximum when the time delay is removed from the data. This property offers an approach to estimate time delay without the need of estimation of other parameters.

Most existing methods for time delay estimation rely on the knowledge of the system model. In this work, we propose the application of a recently developed nonparametric system identification technique to the time delay estimation. Brunton and colleagues proposed a method called the sparse identification of nonlinear dynamics (SINDy) in 2016 for creating a sparse representation of the unknown nonlinear function of the system [24]. The method has attracted a great attention from the community. It assumes that only a few important terms out of a library of functions are needed to describe the dynamics of the system. It combines machine learning and sparsity-promoting techniques to find the sparse representation in the space of possible functions to model nonlinear dynamical systems. The connection of the SINDy method to the Akaike information criteria (AIC) for model selection has been studied in [25]. The promising results for system identification problems such as hybrid dynamical systems, chaotic Lorenz system and Burger’s partial differential equation have been obtained [25, 26]. The SINDy method applied to the model predictive control delivers better performance, requires significantly less data, and is more computationally efficient and robust to noise than the neural networks model [27]. Robustness to noise and requirement for measurements of derivatives are concerns with the SINDy approach [28]. The total variation regularized derivatives are commonly used to estimate the derivatives [29, 30]. An integral form of equations of motion in combination with sparse regression is proposed in [31]. An application to model identification of nonlinear mechanical systems is reported in [32].

In practical cases, the data is often contaminated with noise. Fliess and Ramirez proposed a robust and fast algebraic identification technique to estimate time invariant linear systems without time delay [33]. Inspired by the algebraic operation, Belkoura in [12] and [13] first investigated the identifiability conditions for a general class of systems described by convolution equations. Then, an algebraic formulation is introduced for online estimation of time delay and parameters of structured and arbitrary input-outputs.

In this paper, we extend the SINDy approach by combining it with the algebraic signal processing method to deal with the issue of measurement noise, initial conditions and derivatives. The algebraic operation generates useful signals for system identification while filtering out the noise. We apply the Taylor expansion to make the time delay appear as a parameter of the model to identify. A nonlinear extended state estimator is adopted for derivative estimation. As a result, we arrive at a robust sparse regression combined with cross validation and bootstrapping techniques for nonparametric system identification. The simulation and experimental results illustrate that the proposed algorithm can overcome the following limitations of the existing techniques:

  • Multiple local minima Achieving the global minimum is the main challenging for modified least square methods with recursive approach. The sparse regression is a convex optimization problem with a global minimum [34].

  • Nonparametric nonlinear identification The only assumption of the proposed algorithm about the structure of the system is that the system is sparse in the space of base functions. The proposed algorithm relies on the data to make selection and is not limited to linear and SISO systems.

  • Noise resistant The sparse regression is already robust. The algebraic operation offers additional filtering of the noise. Furthermore, the proposed algorithm is equipped with bootstrapping to study the statistics of estimation such as mean and standard deviations.

  • Unstructured entries Many classical strategies are designed for pulsed entries. The proposed approach analyzes the input and output data without any frame and structure assumption for entries.

  • Initial conditions Another general assumption in time delay estimation is that the initial conditions are zero while in most of practical applications, it is not true. The algebraic operation and operational calculus make the proposed algorithm independent of the initial conditions.

The rest of the paper is organized as the follows. Section 2 introduces the assumptions and formulates the mathematical problem of system identification. The techniques of algebraic data preprocessing and derivative estimations, sparse regression in combination with bootstrapping resampling, and cross validation are explained in Sect. 3. Section 4 presents an example of a simulated nonlinear mass-spring-damper system under a proportional control with time delay. The experimental validation of the proposed algorithm on the rotary flexible joint made by Quanser is presented in Sect. 5. Finally, Sect. 6 concludes the paper.

2 Problem definition and assumptions

Consider a closed-loop second order system given by,

$$\begin{aligned} m{\ddot{x}}+g(x,{\dot{x}})=f(t)+b_{1}{\dot{x}}(t-\tau )+b_{0}x(t-\tau ) \end{aligned}$$
(1)

where \(m>0\) is the mass of the system, the function \(g(x,{\dot{x}})\) represents the nonlinear restoring or internal force of the system. The term f(t) contains the reference information as well as external disturbances. The control consists of the output and its first-order derivative with feedback gains \(b_{0}\) and \(b_{1}\), and a time delay \(\tau \). We make the following assumptions in this study.

2.1 Assumptions

  1. 1.

    Only the system response x(t) is measured.

  2. 2.

    The system is exposed to random excitations included in f(t).

  3. 3.

    The measurement contains Gaussian white noise with zero mean denoted as \(\epsilon _{x}\).

  4. 4.

    The system structure in terms of the function \(g(x,{\dot{x}})\) is unknown.

  5. 5.

    The system is second order.

This paper is focused on nonparametric identification of closed-loop nonlinear dynamical systems with a control time delay.

3 The proposed method

For the restoring force \(g(x,{\dot{x}})\) with an unknown structure, there are many candidate functions available to approximate it, such as polynomial, trigonometric, exponential functions or a combination of these functions. Because the polynomial is a popular base function to describe a wide range of dynamical systems, we use regular polynomials of x and \({\dot{x}}\) with time-invariant coefficients to explain the proposed method,

$$\begin{aligned} g(x,{\dot{x}})={\sum _{i=0}^{N}}{\sum _{j=0}^{M}}{c_{ij}}{x^{i}{\dot{x}}^{j}} \end{aligned}$$
(2)

where \(c_{ij}\) are the unknown coefficients of the polynomial. Other functions can also be considered with the proposed method.

For the system without any prior knowledge, the orders N and M of the polynomial are unknown. This study develops an algorithm to identify the polynomial defined in Eq. (2) from the response data such that it has a minimum number of terms and with the minimum orders N and M.

3.1 Algebraic operation

Noise robustness has been always an inevitable part of system identification techniques as the noise is unavoidable in real data. Time delay caused by sensors and actuators is usually so small that it is difficult or even impossible to estimate correct values of time delay from noisy data. In addition to noise robustness, dynamical system identification methods require derivatives of time series of the measured response. Traditional finite difference methods to estimate derivatives can amplify the noise. To deal with this issue, we propose to apply the algebraic operation to pre-process the data.

We use a nonlinear oscillator under a proportional control with gain \(k_{p}\) to illustrate the method.

$$\begin{aligned}&m{\ddot{x}}+c_{1}{\dot{x}}+c_{2}{\dot{x}}x^{2}+c_{3}{\dot{x}}^{3}\nonumber \\&\quad +k_{1}x+k_{3} x^{3}-k_{p}x(t-\tau )=f(t)\nonumber \\&x(0)=x_{0},\ {\dot{x}}(0)=v_{0} \end{aligned}$$
(3)

where m denotes mass, \(c_{1}\), \(c_{2}\) and \(c_{3}\) stand for damping coefficients, \(k_{1}\) and \(k_{3}\) are the stiffness coefficients. The system starts from initial conditions \((x_{0},v_{0})\) which are generally unknown. We introduce several short-hands for the nonlinear terms.

$$\begin{aligned} z_{1}={\dot{x}}x^{2},\ \ z_{2}={\dot{x}}^{3},\ \ z_{3}=x^{3} \end{aligned}$$
(4)

Applying the Laplace transform to Eq. (3), we have

$$\begin{aligned}&m\left\{ s^{2}X(s)-sx_{0}-v_{0}\right\} \nonumber \\&\quad +c_{1}\left\{ sX(s)-x_{0}\right\} +k_{1}X(s)-k_{p} e^{-s\tau } X(s)=G(s) \end{aligned}$$
(5)

where

$$\begin{aligned} G(s)=F(s)-c_{2}Z_{1}(s)-c_{3}Z_{2}(s)-k_{3}Z_{3}(s), \end{aligned}$$
(6)

X(s) is the Laplace transform of x(t), \(Z_{i}(s)\) is the Laplace transform of \(z_{i}(t)\) and F(s) is the Laplace transform of f(t).

Consider the Taylor expansion of the exponential term \(e^{-s\tau }\)

$$\begin{aligned} e^{-s\tau }=1-s\tau +\frac{s^{2}\tau ^{2}}{2!}-\frac{s^{3}\tau ^{3}}{3!}+... \end{aligned}$$
(7)

We should point out that keeping too many higher order terms may not be beneficial to the identification process. We only need to keep a sufficient number of terms to generate enough equations to determine the unknown parameters including time delay.

As an example, we keep the terms up to the third order and substitute the Taylor expansion in Eq. (5).

$$\begin{aligned}&m\left\{ s^{2}X(s)-sx_{0}-v_{0}\right\} +c_{1}\left\{ sX(s)-x_{0} \right\} \nonumber \\&\quad +k_{1}X(s) -k_{p} \left\{ 1-s\tau +\frac{s^{2}\tau ^{2}}{2!}-\frac{s^{3} \tau ^{3}}{3!}\right\} X(s)=G(s) \end{aligned}$$
(8)

To eliminate the initial conditions, we differentiate Equation (8) with respect to s twice. To eliminate the derivative terms in time domain, we divide the resulting equation by \(s^{3}\). Back to time domain, we obtain an equation of signals

$$\begin{aligned} P_{f}(t)&=k_{p}\frac{\tau ^{3}}{3!}P_{1}(t)+\left\{ m-k_{p}\frac{\tau ^{2} }{2}\right\} P_{2}(t)+\left\{ c_{1}+k_{p}\tau \right\} P_{3} (t)\nonumber \\&\quad +c_{2}P_{4}(t)+c_{3}P_{5}(t)+\left\{ k_{1}-k_{p}\right\} P_{6} (t)+k_{3}P_{7}(t) \end{aligned}$$
(9)

where

$$\begin{aligned} P_{1}(t)&=6\int ^{(2)}x(t)-6\int tx(t)+t^{2}x(t)\nonumber \\ P_{2}(t)&=2\int ^{(3)}x(t)-4\int ^{(2)}tx(t)+\int t^{2} x(t)\nonumber \\ P_{3}(t)&=-2\int ^{(3)}tx(t)+\int ^{(2)}t^{2}x(t)\nonumber \\ P_{4}(t)&=\int ^{(3)}t^{2}z_{1}(t),\ P_{5}(t)=\int ^{(3)}t^{2} z_{2}(t)\nonumber \\ P_{6}(t)&=\int ^{(3)}t^{2}x(t),\ P_{7}(t)=\int ^{(3)}t^{2}z_{3} (t)\nonumber \\ P_{f}(t)&=\int ^{(3)}t^{2}f(t) \end{aligned}$$
(10)

Note that \(\int ^{(n)}\phi (t)\) denotes the multiple integral \(\int \limits _{0}^{t}\int \limits _{0}^{\sigma _{1}}...\int \limits _{0}^{\sigma _{n-1} }\)\(\phi (\sigma _{n})d\sigma _{n}...d\sigma _{1}\). Let \(t_{k}\) \((k=1,2,...n_{t})\) be a set of sampled times. Define an error at time \(t_{k}\) as

$$\begin{aligned} e(k)&=k_{p}\frac{\tau ^{3}}{3!}P_{1}(t_{k})+\left\{ m-k_{p}\frac{\tau ^{2}}{2}\right\} P_{2}(t_{k})+\left\{ c_{1}+k_{p}\tau \right\} P_{3}(t_{k})\nonumber \\&\quad +c_{2}P_{4}(t_{k})+c_{3}P_{5}(t_{k})+\left\{ k_{1}-k_{p}\right\} P_{6} (t_{k})\nonumber \\&\quad +k_{3}P_{7}(t_{k})-P_{f}(t_{k}). \end{aligned}$$
(11)

Let us introduce an error vector \({\mathbf {e}}\), a parameter vector \({\mathbf {c}}\) and a force vector \({\mathbf {p}}\) as follows.

$$\begin{aligned} {\mathbf {e}}&= [e(1),e(2),...,e(n_{t})]^{T},\nonumber \\ {\mathbf {c}}&= \left[ k_{p}\frac{\tau ^{3}}{3!},m+k_{p}\frac{\tau ^{2}}{2} ,c_{1}-k_{p}\tau ,c_{2} ,c_{3},k_{1}+k_{p},k_{3}\right] ^{T},\nonumber \\ {\mathbf {p}}&= [P_{f}(t_{1}),P_{f}(t_{2}),...,P_{f}(t_{n_{t}})]^{T}. \end{aligned}$$
(12)

Let \({\mathbf {P}}(x)\) be the \(n_{t}\times 7\) data matrix determined by the measurement of x(t) such that its \((k,j)^{th}\) component is given by \(P_{k,j}=P_{j}(t_{k})\). Then, Eq. (11) can be written in the matrix form as

$$\begin{aligned} {\mathbf {e}}({\mathbf {c}})={\mathbf {P}}(x){\mathbf {c}}-{\mathbf {p}}. \end{aligned}$$
(13)

To find the unknown parameters \({\mathbf {c}}\), we formulate a least mean square error problem as follows. For the following cost function,

$$\begin{aligned} J({\mathbf {c}})={\mathbf {e}}^{T}{\mathbf {e}}=\left\| {\mathbf {e}}\right\| _{2}^{2} \end{aligned}$$
(14)

we find the parameter vector \({\mathbf {c}}\) such that J is minimized. The solution of the parameters \({\mathbf {c}}\) can be found by either the matrix inversion or by an iterative search algorithm. In general, when the numbers of unknown parameters and data points \(n_{t}\) are large, as is the case for multi-degree of freedom systems, it is better to use a global search algorithm to compute the parameters.

3.2 Sparse representation

Assume that the displacement measurement x(t) of the system in Eq. (3) and external force f(t) are available and contain random noises. We consider the polynomial model for the restoring force in Eq. (2). Two questions immediately arise:

What are the minimum orders N and M?

Are all the terms in the polynomial needed?

To investigate answers for these questions, we apply the methods in statistical learning [35]. One popular way is to start with big enough orders N and M or a large library of functions which may contain non-polynomial functions, and penalize the terms with small coefficients through a sparse regulator by using the least absolute shrinkage and selection operator (LASSO) [35]. The recent studies of sparse identification of nonlinear dynamics (SINDy) have developed a computationally efficient algorithm to compute the solution of the sparse regression. LASSO regression proposes to add an \(L^{1}\) regularization term to the cost function in Eq. (14) to penalize the nonzero parameters that don’t contribute to the system’s dynamics

$$\begin{aligned} J_{\lambda }({\mathbf {c}})&=\left\| {\mathbf {e}}\right\| _{2}^{2} +\lambda \left\| {\mathbf {c}}\right\| _{1} \nonumber \\&=\left\| {\mathbf {P}}(x){\mathbf {c}}-{\mathbf {p}}\right\| _{2}^{2} +\lambda \left\| {\mathbf {c}}\right\| _{1} \end{aligned}$$
(15)

where \(\lambda >0\) is a preselected positive number known as the sparse regulator and \(\left\| {\mathbf {c}}\right\| _{1}=\sum _{i}|c_{i}|\). The sparse regression algorithm in [24] is applied to find the parameter vector \({\mathbf {c}}\) in order to minimize \(J_{\lambda }\) in Eq. (15). The algorithm is computationally efficient and robust to noise. It should be noted that the optimization problem in the framework of the LASSO is convex [34], which implies the existence of a unique optimal solution. The selection of the sparse regulator \(\lambda \) is critical. In Sect. 3.3, we explain how to employ cross-validation techniques to select a proper regulator value.

The LASSO penalizes the terms with small coefficients by regularization and keeps the important terms in the system model. In real world, time delay is usually small. Moreover, time delay appears in high order terms of the Taylor expansion. Therefore, the LASSO will penalize these terms, such as the coefficient of term \(P_{1}(t)\) in Eq. (11).

We can recover the terms involving time delay in the following way. After each sparse regression computed with the SYNDy algorithm, we keep all the terms involving time delay as well as the terms selected by the LASSO regulation. The resulting data matrix denoted by \({\mathbf {P}}_{s}(x)\) is substituted in Eq. (15) to compute the updated coefficients \({\mathbf {c}}\).

3.3 Cross validation and bootstrapping

It is common to use cross-validation techniques from machine learning to determine the sparse regulation parameter \(\lambda \). The value of \(\lambda \) in an finite interval is sampled. The cross-validation mean square error \(MSE_{CV}\) of the model over the test dataset is computed. The \(\lambda \) value which minimizes the cross-validation error is selected. The corresponding model is chosen as an optimal model with a proper balance of complexity and accuracy.

Let \(T_{cv}\) denote the set of time instances of the test signal to be used for cross validation. For a given regulation parameter \(\lambda \), the mean square cross validation error \(MSE_{CV}\) of the model over all the validation datasets is defined as

$$\begin{aligned} MSE_{CV}(\lambda )=\frac{1}{n_{t}} \left\| {\mathbf {P}}(x(t\in T_{cv} )){\mathbf {c}}_{\lambda }-{\mathbf {p}}(t\in T_{cv})\right\| _{2}^{2} \end{aligned}$$
(16)

where \({\mathbf {c}}_{\lambda }\) denotes the model parameters for the regulation parameter \(\lambda \). \(MSE_{cv}\) is an implicit function of \(\lambda \). The SINDy algorithm attempts to select \(\lambda \) on the Pareto front of the multi-objective optimization problem with the objectives being accuracy and complexity of the model. The elbow of the Pareto front parameterized by \(\lambda \) is often the choice [24].

Unfortunately, in most of cases, the elbow of the Pareto front is ambiguous due to existence of a cluster of candidate models near the elbow. The information criteria (IC) for the candidate models can help to rank and select a model with a proper trade-off between the accuracy and complexity [25]. Popular statistical examples of the information criteria include the Akaike information criterion (AIC), Bayesian information criterion (BIC), deviance information criterion (DIC) and minimum description length. The work reported in [25] makes use of a big data matrix \({\mathbf {P}}(x)\). The sparse regression procedure is repeated over all possible combinatorial subsets of the data matrix. The resulting models are ranked through IC scores. The model with the smallest score is selected.

However, a big data matrix \({\mathbf {P}}(x)\) may not always be available for real-world applications. In this work, we propose a search algorithm to determine the sparse representation of the polynomial in Eq. (2). We start with linear model when \(N=M=1\), and increase the polynomial order until the prediction error of the model over the test dataset reaches an acceptable low level and begins to increase. The error of the test dataset is defined as,

$$\begin{aligned} MSE_{test}=\frac{1}{n_{t}} \left\| {\mathbf {P}}(x(t\in T_{cv})){\hat{\mathbf {c}}}-{\mathbf {p}}(t\in T_{cv})\right\| _{2}^{2} \end{aligned}$$
(17)

where \({\hat{\mathbf {c}}}\) stands for the estimated vector of coefficients. There are different ways to generate training and test datasets. When the SINDy algorithm is applied to simulation examples, we can generate rich datasets to train and validate the model by considering the system responses for different initial conditions and excitations. In this work, we assume that the dataset consists of two long time series of the system response. One is used for training and another for cross-validation. The data matrices \({\mathbf {P}}(x)\) are generated for different orders of the polynomial. The SINDy algorithm selects the sparse model built on the training data while the regularization parameter \(\lambda \) is selected with the help of cross validation on the test dataset.

Real measurements often contain noises, and can have outliers and missing data. Here, we propose to combine the sparse regression with bootstrapping in order to develop robust sparse regression. In particular, we consider K bootstrap sample vectors containing L elements of the original data points. Each vector is generated by uniform sampling of the data with replacement. For each bootstrap sample vector, the sparse regression is applied to identify the model. Finally, the parameters of the model is the average of the K estimated coefficient vectors \({\hat{\mathbf {c}}}_{l}\).

$$\begin{aligned} {\tilde{\mathbf {c}}}=\frac{1}{K}{\sum _{l=1}^{K}}{\hat{\mathbf {c}}}_{l} \end{aligned}$$
(18)

The standard deviation of estimated coefficient vectors \({\hat{\mathbf {c}}}_{l}\) is computed to study the variation of parameter estimation.

Algorithm 1 summarizes the procedure.

figure a

Remark 3.1

Some remarks on the estimation error and the stability of the system are in order. The estimation error of the coefficients \({\mathbf {c}}\), in particular, time delay \(\tau \), can be attributed to two sources: the truncation error of the Taylor expansion and the regression error.

While keeping more higher order terms of the Taylor expansion helps to reduce truncation error, we may run the risk of instability of the truncated model. We have found that the third order term is a good compromise for accuracy, complexity and stability.

Table 1 The true and estimated parameters of the simulation example

4 Simulated example

To demonstrate the proposed algorithm, we consider the second-order oscillatory system with nonlinear stiffness and damping in Equation (3). The following external forces are used to generate the training and test datasets.

$$\begin{aligned}&f(t)_{Train}=10\sin (4t)+40\sin (4t^{2})+\epsilon _{f} \end{aligned}$$
(19)
$$\begin{aligned}&f(t)_{Test}=10\sin (2t)+40\sin (8t)+\epsilon _{f} \end{aligned}$$
(20)

For the training dataset, the system starts from the initial conditions \(x_{0}=1\) and \(v_{0}=0\). For the test dataset, the initial conditions are \(x_{0}=-1\) and \(v_{0}=0.5\). The excitation contains a normally distributed random noise \(\epsilon _{f}\) with zero mean and standard deviation \(\sigma _{\epsilon _{f}}=0.01\). We assume that the sensors have measurement noises such that the system output is given by \(x(t)+\epsilon _{x}\) where \(\epsilon _{x}\) is the normally distributed random noise with zero mean and standard deviation \(\sigma _{\epsilon _{x}}=0.01\).

Fig. 1
figure 1

The cross validation test error for the mass-spring-damper system

Table 2 lists the system parameters used in the simulation. A proportional control with gain \(k_{p}=2\) is considered. We employ the third order of Taylor expansion of the time delay \(x(t-\tau )\). The order of expansion is selected such that there are enough equations to solve for the unknown parameters of the system. Following Algorithm 1, we create the data matrix \({\mathbf {P}}(x)\) with polynomials of orders from 1 to 6 and \(N=M\). For each polynomial model, 30 bootstrapping samples with the ratio of 50 percent of the total data are selected. 50 values of \(\lambda \) are sampled logarithmically in the range from \(10^{-6}\) to \(10^{0}\).

Because the first term includes the time delay, we constrain the LASSO algorithm to keep it from being penalized and removed in the process of searching for sparse representation. The average of sparse polynomials out of all the bootstrapping samples is taken as the final result.

Figure 1 shows the variation of the cross validation error \(MSE_{cv}\) as a function of the order N of the polynomial. It is seen from the figure that the cross validation error has a large drop when approaching \(N=3\) and stays at the same range for \(N=4\) and starts increasing from \(N=5\). We choose \(N=3\) as the optimal order of the polynomial when the minimum validation error occurs and the order is minimum. Table 1 lists the estimated coefficients of the signals in Eq. (9). The proposed algorithm has successfully detected the sparse terms and precisely estimated the coefficients. The time delay and parameters of the original system are calculated and listed in Table 2 together with the known values. The accuracy of the estimated parameters is quite acceptable.

Remark 4.1

A remark on small values of the standard deviation in Table 1 is in order. This fact is an indication that the randomness of the data is not significant so that the bootstrapping samples are not sufficiently different. We have checked the results several times to confirm this observation.

Table 2 The parameters of the mass-spring-damper system for the simulation example
Fig. 2
figure 2

The flexible joint set up by Quanser [36]

Table 3 The parameters of the rotary flexible link provided by Quanser [37]
Fig. 3
figure 3

The system response to track a sinusoidal trajectory \(\theta _{d}\) with amplitude 2 radian and frequency 0.66 Hz

5 Experimental example

Fig. 4
figure 4

The system response to a square-wave signal \(\theta _{d}\) with amplitude 1 radian and frequency 0.66 Hz

In this section, we employ a flexible joint experimental setup, made by Quanser, to prove the efficiency of the method for experimental data from a single-input-multiple-outputs (SIMO) dynamical system. The joint is connected to the base through two springs with equal stiffness \(k_{s}\) and the base is fixed to the servo motor as Fig. 2 shows. The servo motor with input voltage \(V_{m}\) generates a torque M to turn the base with a rotation angle \(\theta \). The deflection angle of the joint relative to the base is \(\alpha \). The moment of inertia of the base is \(J_{eq}\). The viscous damping coefficient of the base is \(B_{eq}\). \(J_{l}\) stands for the moment of inertia of the joint. The equations of motion for the system can be derived as,

$$\begin{aligned}&(J_{eq}+J_{l})\ddot{\theta }+J_{l}{\ddot{\alpha }}+B_{eq}\dot{\theta }=\tau _M,\nonumber \\&J_{l}{\ddot{\alpha }}+J_{l}\ddot{\theta }+B_{l}{\dot{\alpha }}+k_{s} \alpha =0, \nonumber \\&\tau _M=\frac{\eta _{g}k_{g}\eta _{m}k_{t}(V_{m}-k_{g}k_{m}{\dot{\theta }})}{R_{m}} \equiv a\cdot V_{m} + b\cdot {\dot{\theta }}, \nonumber \\&a=\frac{\eta _{g}k_{g}\eta _{m}k_{t}}{R_{m}},\ \ b=-\frac{\eta _{g}k_{g}\eta _{m}k_{t}k_{g}k_{m}}{R_{m}}, \end{aligned}$$
(21)

where torque \(\tau _M\) is linearly dependent on the servo motor voltage \(V_{m}\) and \({\dot{\theta }}\). We assume that the motor parameters are known. Table 3 lists the parameters of the motor and joint provided by Quanser. We should point out that these numbers may not be exactly the same as the physical system parameters. Therefore, we should treat these numbers as a reference.

A proportional control with gain \(k_{p}\) adjusts the motor voltage to make the joint follow the desired trajectory of the angle \(\theta \) while the deflection angle \(\alpha \) stays minimum. The control is given by

$$\begin{aligned} V_{m}=k_{p}*(\theta _{d}-\theta ) \end{aligned}$$
(22)

The control is implemented in Simulink/MATLAB. The outputs of the system are \(\theta \) and \(\alpha \). A time delay is introduced in the feedback signal \(\theta \).

Fig. 5
figure 5

The cross-validation error of the rotary flexible joint

In this example, we employ the second order Taylor expansion of the time delay term.

$$\begin{aligned} \theta (t-\tau )=\theta (t)-\tau {\dot{\theta }}(t)+\frac{\tau ^{2}}{2}\ddot{\theta }(t) \end{aligned}$$
(23)

The closed-loop system can be written as,

$$\begin{aligned}&\left( J_{eq}+ak_{p}\frac{\tau ^{2}}{2}\right) \ddot{\theta }+(B_{eq}-ak_{p}\tau )\dot{\theta }-B_{l} {\dot{\alpha }}-k_{s}\alpha \nonumber \\&\quad =ak_{p}(\theta _{d}-\theta )+b{\dot{\theta }}, \end{aligned}$$
(24)
$$\begin{aligned}&J_{eq}{\ddot{\alpha }}-ak_{p}\frac{\tau ^{2}}{2}\ddot{\theta }+(ak_{p}\tau -B_{eq}){\dot{\theta }}+\frac{B_{l}(J_{l}+J_{eq})}{J_{l}}{\dot{\alpha }} \nonumber \\&\quad +\frac{k_{s}(J_{l}+J_{eq})}{J_{l}}\alpha =-ak_{p}(\theta _{d}-\theta )-b{\dot{\theta }}. \end{aligned}$$
(25)
Table 4 The \(\theta \) and \(\alpha \) terms, their coefficients and the standard deviations of the estimation for the rotary flexible joint

We should point out that the tracking performance of the closed-loop system will not be as good as we would like to have. This is because the proportional control alone is not adequate and the time delay further deteriorates the performance. The purpose of choosing this control with time delay is simply for generating the data in a stable manner.

To generate the training dataset, we select a desired trajectory \(\theta _{d}\) as

$$\begin{aligned} \theta _{d}=2\sin 2\pi f_{1}t \end{aligned}$$
(26)

where the frequency \(f_{1}=0.66Hz\) is for the desired trajectory \(\theta _{d}\). To generate the test data, we choose a square-wave signal for the desired trajectory \(\theta _{d}\) with the same amplitude and frequency as in Equation (26). We introduce a 0.2s delay in the control such that the closed-loop system is stable.

Figures 3 and 4 show the motor input voltage \(V_{m}\) and the sample responses of \(\theta \) and \(\alpha \), The time series is one minute long with a sample time 0.001s. For simplicity, we assume that for each coordinate \(\theta \) and \(\alpha \), the restoring force can be represented by a sum of four polynomials of a single variable, i.e. \({\theta ^{i}}\), \({{\dot{\theta }}^{i}}\), \({\alpha ^{j}}\), and \({{\dot{\alpha }}^{j}} \), together with a friction term related to \({\dot{\theta }}\), without considering the cross-product terms. The second coefficient in Equation (25), \(ak_{p}\frac{\tau ^{2}}{2}\), includes the time delay. We apply the constraint to the LASSO algorithm to keep it from being penalized and removed.

Some signals in the matrix \({\mathbf {P}}(x)\) contain the first order derivative of the angles \({\dot{\theta }}\) and \({\dot{\alpha }}\). Recall that the real measurements contain noises. We use the nonlinear state observer from the control studies to estimate the first-order derivatives without amplifying the noise in the computation [38, 39]. The second order observer with gains \(\beta _{1}\) and \(\beta _{2}\) is defined by the following state equations.

$$\begin{aligned} {\dot{z}}_{1}&=z_{2}-\beta _{1}e,\nonumber \\ {\dot{z}}_{2}&=-\beta _{2}\mathrm {fal}(e,\alpha ,\delta ) \end{aligned}$$
(27)

where

$$\begin{aligned} \mathrm {fal}(e,\alpha ,\delta )= {\left\{ \begin{array}{ll} |e|^{\alpha }\cdot \mathrm {sign}(e),\ \ \ |e|>\delta \\ \frac{e}{\delta ^{\alpha }},\ \ \ |e|\le \delta \end{array}\right. } \end{aligned}$$
(28)

and the error \(e=z_{1}-\theta \). We have chosen \(\alpha =0.5\), \(\delta =0.05\), \(\beta _{1}=100\) and \(\beta _{2}=900\). According to [38, 39], the estimation error e converges to zero quickly. Consequently, by definition, \(z_{2}\) is an accurate estimate of the first order derivative \({\dot{\theta }}\). Figures 3 and 4 show the plot of estimated \({\dot{\theta }}\) and \({\dot{\alpha }}\) for both training and test dataset.

To train the model, we consider 10 bootstrapping samples with the length of 50 percent of the training dataset and search the polynomial order from 1 to 7. For the sparse regulator, 100 values of \(\lambda \) are sampled logarithmically in the range from \(10^{-10}\) to \(10^{0}\).

Figure 5 shows the variation of the cross validation error as a function of the order of polynomials. The results suggest that for both coordinates \(\theta \) and \(\alpha \), the polynomial order \(N=5\) leads to the minimum cross validation error. This is the optimal order for the polynomials.

Table 4 lists the polynomial terms and the corresponding coefficients and associated standard deviation for the trained model of polynomial order \(N=5\). We should mention that the reported results have been round to \(10^{-5}\). The estimated value of Coulomb friction \(f_{c}\) is not negligible. It can play an important role in the system dynamics. We should point out that the linear model in Eq. (21) does not include the friction term, although the actual device always has friction. The friction is the primary reason for the discrepancy between the predicted response by Eq. (21) and the actual measurements.

Figures 3 and 4 indicate that the responses of the closed-loop system are not really tracking the references accurately. This is partly due to the effect of time delay and also the poor control design, as discussed earlier. The oscillatory responses of the closed-loop system are obviously responsible for the high order terms of the polynomials, particularly for the deflection angle \(\alpha \), as can be seen in Table 4. This highlights the ability of the proposed system identification algorithm to estimate the time delay and to identify the nonlinearities in the system when the linear models are no longer adequate.

Table 5 The nominal and estimated values of the parameters of the rotary flexible link by Quanser [37]

The motor parameter a is known and \(k_{p}\) is the given control gain. Hence, from the coefficient \(-ak_{p}\frac{\tau ^{2}}{2}\), the time delay can be calculated and is listed in Table 5 in comparison with the time delay we introduced to the control. The values of the parameters in Table 5 fall in a wide range from 0.002 to 2. Hence, the parameters can be serval order of magnitudes apart, which makes it difficult to accurately estimate all the parameters in the presence of unwanted noises. This experimental study strongly demonstrates the robustness of the proposed algorithm to noises.

6 Conclusions

In summary, we have demonstrated that the proposed algorithm is effective to obtain governing equations of nonlinear dynamical systems with time delay from noisy experimental data. It can accurately estimate the time delay in the feedback control. For the first time, this study extends the sparse regression to nonlinear dynamical systems with time delay. We have equipped the sparse regression with an algebraic signal pre-processing and a nonlinear state observer. These operations are essential to compute the needed derivatives of measured time series and other signals without the need to know initial conditions, and to filter noises due to random excitations and measurements. Both simulation and experimental results have been used to validate the algorithm. The algorithm demonstrates excellent performances in identification.

The codes and all the data are available at: https://github.com/gleylaz/TimeDelay_SysID.