1 Introduction

In recent years, robots have increasingly played a significant role in daily services and industrial operations. Generally, robots are always devised for accurate and prespecified continuous trajectory tracking in a structured environment or required human–robot interaction in an unstructured complicated environment. This results in potential trouble such as performance deterioration and contact impact with the working environment when unknown disturbances happened [1, 2]. Controlling a manipulator to guarantee a high tracking performance in the presence of unknown time-varying disturbances is still a challenge in the research community. In addition, conventional control approaches are often insufficient to deal with these problems effectively and require a lengthy design and complex tuning process [3]. Moreover, when unexpected contact/collision situations happened during many tasks such as assembly, grinding and deburring, grasping, or manipulation of deformable and delicate objects, the accurate and rapid response ability against external disturbances essentially determines the feasibility and reliability of further robot application. On the other hand, the parameter tuning based on experiments makes the controller hard to achieve optimal state and inconvenient to implement. Exploring effective and practical methodologies for robot controller design with a strong disturbance rejection ability has been a main concern.

Many model-based control strategies have been developed to increase the tracking performance and reliability of robots [4,5,6,7]. These methods utilize exact robot dynamics for controller design and illustrate superior performance in simulation environments. However, it is well known that the development of a real-world easy-to-use robot system often suffers from many restrictions such as system modeling errors, environment uncertainties, and limited algorithm complexity. Therefore, modern control strategies based on an explicit model of a specific system mostly stay in the theoretical design and numerical simulation phase. The classic PID controller is the most widely used type in industrial applications. However, owing to its simple control law and limited parameter tuning range, it may be difficult for a PID controller to allow the robot to achieve both good dynamic and static performance (the system performance after the transient process). In addition, when we need a robot to perform at high speed with a wide range of motion, the PID controller might no longer be effective or can even lead to instability of the controlled system [8].

To suppress disturbances, the sliding mode control (SMC) and disturbance rejection control strategies are effective owing to their strong robustness to unknown exogenous disturbances, parameter variations, and model perturbations [9,10,11]. The SMC is a widely utilized control strategy in real applications [12]; however, the traditional SMC cannot effectively handle fast variable disturbances and may cause a chattering phenomenon owing to modeling errors and uncertainties, which degrades its performance in robots control [13]. Wang et al. [14] proposed a robust SMC methodology for robotic systems with compliant actuators that employed a generalized proportional integral observer technique to estimate unknown disturbances. M. Van et al. [15] developed a tracking control approach for robot manipulators where an adaptive backstepping sliding mode control is used. However, these control methods are not designed for pure joint control, and hence the corresponding performance with six-joint simultaneous motion is not presented. With the increase in robot DOFs, the dynamic coupling increases significantly and the modeling accuracy decreases at the same time, which causes fast variable endogenous disturbances and makes robot control problems much more difficult. Thus, the conventional SMC limits its performance to real robot applications. In addition, the conventional design procedure requires prior knowledge and many tuning experiments.

Active disturbance rejection control (ADRC) is a type of disturbance rejection control method based on the PID concept [16, 17]. The ADRC method does not require an explicit plant model but designs a unique extended state observer (ESO) to estimate and compensate for the total disturbance before the plant output has an impact [18]. The ADRC method only needs to know some basic system information such as the order of the system and the control input/output. This has the advantages of strong disturbance rejection ability and strong control robustness. In recent years, the ADRC method has been widely used in servo control systems [19], industrial process control [20], aerospace [21], and other research fields [22,23,24], thus exhibiting a promising future in industrial applications.

However, ADRC has been less applied to robot manipulator control problems at present. Castaneda et al. [25] designed an adaptive controller based on ADRC to solve the trajectory tracking problem of a “Delta” parallel robot considering the uncertainty of the dynamics model. Talole et al. [26] designed an ESO-based feedback linearization controller for the trajectory tracking control of a flexible-joint robotic system. A rotary single-link robot experiment indicated the efficacy of the ADRC approach. Xue et al. [27] integrated ADRC with an existing PD structure for the set-point tracking control of robots. The effectiveness of the proposed modularized ADRC design is tested with a 1-DOF rotary manipulator. Madonski [28] studied the problem of estimating and suppressing periodic disturbances in robot control. The framework of ADRC was used, and experiments on a 3-DOF torsional plant demonstrated the effectiveness of the proposed scheme. Ren et al. [29] proposed a collision detection method based on ontology sensors (encoder and torque sensor) for collaborative robots using the ESO approach. Dong et al. [30] proposed a cascaded torque controller with an ADRC velocity inner loop to improve the control quality of the joint torque. The authors proposed an efficient and simple robot controller based on the ADRC method to realize the rapid and stable trajectory tracking of a robot [31].

All of these studies show that the ADRC has a great potential for robot control. However, for robot controllers that need high speed and high precision, the conventional ADRC has a simple feedback law in which a residual estimate error causes system performance deterioration. In addition, the controller design methods such as parameter tuning are usually lengthy and based on experiments, which makes the obtained controller always achieve suboptimal performance and inconvenient to implement.

In summary, the design of a robot tracking controller needs to guarantee three major requirements: (1) fast transient response and high precision, (2) robustness to model uncertainties and strong disturbance rejection ability, and (3) a simple design and tuning process. Motivated by the above issues, in this work, we developed a practical and effective control method for a robot system’s trajectory tracking performance subjected to unknown time-varying disturbances. The main idea is to use the ADRC methodology to improve the robustness and accuracy of a traditional SMC controller. First, a tracking differentiator is used to obtain a smoothed reference position and velocity trajectory. Then, an ESO is employed to estimate and compensate for the model uncertainties and unknown time-varying disturbances toward simplifying the SMC law and thus improving the tracking accuracy and robustness of the robot system. Furthermore, a learning-based parameter tuning method is presented to autonomously obtain the control parameters offline. To obtain a robust and transferring controller (which means the obtained controller can have similar performance under uncertainties and can be easily transferred from simulation to the real robot), a multilayer perceptron (MLP) network [32] is used to learn the joint actuation ability. Simulations and experiments are conducted to validate the proposed control design methodology.

Succinctly, our main contributions of this paper are:

  • Design of a practical and effective control method for a robot system’s trajectory tracking performance subjected to unknown time-varying disturbances.

  • An autonomous learning-based controller design methodology is presented to obtain the optimal control parameters.

  • An actuation network is proposed to learn the joint actuation ability in order to obtain a robust and transferring controller.

The rest of this study is organized as follows. A brief introduction of the dynamic model of a robot is given in Sect. 2. The design of the proposed disturbance-rejection SMC is presented in Sect. 3. The learning-based parameter tuning methodology, including a trained joint actuation network, is developed in Sect. 4. Numerical simulations and experimental results in Sect. 5 demonstrate the effectiveness of the proposed robust control method by comparing it with other three control methods. Finally, the conclusions are drawn in Sect. 6.

2 Disturbance rejection SMC

2.1 System dynamics modeling

Based on the Euler–Lagrangian method, the dynamics equations of an n-joint robot can be derived in terms of its joint variables as follows [33]:

$$ {\mathbf{D}}({\mathbf{q}}){\mathbf{\ddot{q}}} \, + \, {\mathbf{C}}({\mathbf{q}},{\dot{\mathbf{q}}}){\dot{\mathbf{q}}} \, + \, {\mathbf{G}}({\mathbf{q}}) \, + \, {\mathbf{d}}({\mathbf{q}},{\dot{\mathbf{q}}},{\mathbf{\ddot{q}}},t) \, = \, {{\varvec{\uptau}}} $$
(1)

where \({\mathbf{q}}, \, {\dot{\mathbf{q}}}, \, {\mathbf{\ddot{q}}} \in {\mathbb{R}}^{n \times 1}\), respectively, represent the joint angle, velocity, and acceleration; \({{\varvec{\uptau}}} \in {\mathbb{R}}^{n \times 1}\) is the joint torque; \({\mathbf{D}}({\mathbf{q}}) \in {\mathbb{R}}^{n \times n}\) is the symmetric positive definite inertia matrix; \({\mathbf{C}}({\mathbf{q}},{\dot{\mathbf{q}}}) \in {\mathbb{R}}^{n \times n}\) represents the nonlinear Coriolis and centrifugal forces acting on the system; \({\mathbf{G}}({\mathbf{q}}) \in {\mathbb{R}}^{n \times 1}\) is the gravitational torque; and \({\mathbf{d}}({\mathbf{q}},{\dot{\mathbf{q}}},{\mathbf{\ddot{q}}},t) \in {\mathbb{R}}^{n \times 1}\) is the generalized system disturbance that contains the unmodeled system dynamics and external disturbances. For robot tracking control, the disturbances caused of \({\mathbf{\ddot{q}}}\) are less related [27] and \({\mathbf{\ddot{q}}}\) is usually slowly varying in planning, so we ignore the influence of second derivative part \({\mathbf{\ddot{q}}}\) in the generalized system disturbance \({\mathbf{d}}\) in the following sections.

Defining the variables as \({\mathbf{x}}_{1} = {\mathbf{q}}, \, {\mathbf{u}} = {{\varvec{\uptau}}}, \, {\mathbf{y}} = {\mathbf{x}}_{1}\), the system dynamics (1) can be written as the following state-space description:

$$ \left \{ \begin{gathered} \dot{\mathbf{x}} {_{1}} = {\mathbf{x}}_{2} \hfill \\ \dot {\mathbf{x}}_{2} = {\mathbf{D}}^{{ - 1}} ({\mathbf{x}}_{1} ) ( - {\mathbf{C}}({\mathbf{x}}_{1} ,{\mathbf{x}}_{2} ) \hfill \\ {\mathbf{x}}_{2} - {\mathbf{G}}({\mathbf{x}}_{1} )- {\mathbf{d}} + {\mathbf{u}}){\mathbf{y}} = {\mathbf{x}}_{1} . \hfill \\ \end{gathered} \right. $$
(2)

2.2 Control strategy

In this section, we present a trajectory tracking control framework (disturbance rejection sliding mode control, or DRSMC) for robots with unknown time-varying disturbances. Generally, the DRSMC method consists of an observer-based SMC law and an ADRC-based control architecture. The SMC law provides the basic control torque for trajectory tracking, and the ADRC-based control architecture provides both control operational information used in the SMC law and online disturbance compensation. A block diagram of the DRSMC strategy is shown in Fig. 1.

Fig. 1
figure 1

Control structure of DRSMC framework

In Fig. 1, \({\mathbf{q}}_{d}\) is the given signal of the desired positions; \({\tilde{\mathbf{q}}}_{d} , \, {\mathbf{\dot{\tilde{q}}}}_{d} , \, {\mathbf{\ddot{\tilde{q}}}}_{d}\) are reference trajectories obtained from the desired positions; \({\mathbf{q}}_{a} , \, {\dot{\mathbf{q}}}_{a}\) are the actual joint angles and joint velocities; \({\mathbf{z}}_{1} , \, {\mathbf{z}}_{2} , \, {\mathbf{z}}_{3}\) are the augmented system states; \({{\varvec{\uptau}}}_{c}\) is the command control torque; and \({{\varvec{\uptau}}}_{fw}\) is the feedforward torque generated by the feedforward controller.

The overall DRSMC scheme includes the tracking differentiator (TD) [17], extended state observer (ESO), and observer-based SMC law.

The TD is a preprocessing component that obtains the reference trajectory from the given signal of desired positions. A time-optimal differentiator can be obtained by solving the following equation:

$$ \left\{ \begin{gathered} \dot{v}_{1} = v_{2} \hfill \\ \dot{v}_{2} = - r{\text{sgn}}\left( {v_{1} - v + \frac{{v_{2} \left| {v_{2} } \right|}}{{2r}}} \right), \hfill \\ \end{gathered} \right. $$
(3)

where \(v_{1}\) is the desired trajectory and \(v_{2}\) is its derivative; the parameter \(r\) can be selected accordingly to speed up or slow down the transient profile. And the approximated discrete-time solution of Eq. (3) can be obtained as follows [17]

$$ \left\{ \begin{gathered} v_{1} (t + h){\mkern 1mu} = {\mkern 1mu} v_{1} (t){\mkern 1mu} + {\mkern 1mu} h \cdot v_{2} (t) \hfill \\ v_{2} (t + h){\mkern 1mu} = {\mkern 1mu} v_{2} (t){\mkern 1mu} + {\mkern 1mu} h \cdot {\text{fhan}}(v_{1} (t){\mkern 1mu} - {\mkern 1mu} v(t),{\mkern 1mu} c_{1} v_{2} (t),{\mkern 1mu} r_{0} ,{\mkern 1mu} h). \hfill \\ \end{gathered} \right. $$
(4)

Using Eq. (4) twice, we can obtain the second derivative of the desired trajectory simultaneously

$$ \left\{ \begin{gathered} q_{1} (t + h){\mkern 1mu} = {\mkern 1mu} q_{1} (t){\mkern 1mu} + {\mkern 1mu} h \cdot q_{2} (t) \hfill \\ q_{2} (t + h){\mkern 1mu} = {\mkern 1mu} q_{2} (t){\mkern 1mu} + {\mkern 1mu} h \cdot {\text{fhan}}(q_{1} (t){\mkern 1mu} - {\mkern 1mu} q_{d} (t),{\mkern 1mu} c_{1} q_{2} (t),{\mkern 1mu} r_{0} ,{\mkern 1mu} h) \hfill \\ q_{2}^{\prime } (t + h){\mkern 1mu} = {\mkern 1mu} q_{2}^{\prime } (t){\mkern 1mu} + {\mkern 1mu} h \cdot q_{3} (t) \hfill \\ q_{3} (t + h){\mkern 1mu} = {\mkern 1mu} q_{3} (t){\mkern 1mu} + {\mkern 1mu} h \cdot {\text{fhan}}(q_{2}^{\prime } (t){\mkern 1mu} - {\mkern 1mu} q_{2} (t),{\mkern 1mu} c_{2} q_{3} (t),{\mkern 1mu} r_{1} ,{\mkern 1mu} h) \hfill \\ \end{gathered} \right. $$
(5)

where \(q_{i} = \tilde{q}_{d}^{(i - 1)}\) are the generated reference trajectory for each joint, \(h\) is the controller instruction cycle, and \({\text{fhan}}(x_{1} , \, cx_{2} , \, r_{0} , \, h_{0} )\) is a nonlinear control function as follows:

$$ \left\{ \begin{gathered} d{\mkern 1mu} = {\mkern 1mu} r_{0} h_{0}^{2} ,{\mkern 1mu} a_{0} {\mkern 1mu} = {\mkern 1mu} h_{0} cx_{2} ,{\mkern 1mu} y{\mkern 1mu} = {\mkern 1mu} x_{1} {\mkern 1mu} + {\mkern 1mu} a_{0} \hfill \\ a_{1} {\mkern 1mu} = {\mkern 1mu} \sqrt {d(d{\mkern 1mu} + {\mkern 1mu} 8\left| y \right|)} \hfill \\ a_{2} {\mkern 1mu} = {\mkern 1mu} a_{0} {\mkern 1mu} + {\text{ sgn}}(y)(a_{1} {\mkern 1mu} - {\mkern 1mu} d)/2 \hfill \\ s_{1} {\mkern 1mu} = {\mkern 1mu} ({\text{sgn}}(y{\mkern 1mu} + {\mkern 1mu} d){\mkern 1mu} - {\text{ sgn}}(y{\mkern 1mu} - {\mkern 1mu} d))/2 \hfill \\ a{\mkern 1mu} = {\mkern 1mu} (a_{0} {\mkern 1mu} + {\mkern 1mu} y{\mkern 1mu} - {\mkern 1mu} a_{2} )s_{1} {\mkern 1mu} + {\mkern 1mu} a_{2} \hfill \\ s_{2} {\mkern 1mu} = {\mkern 1mu} ({\text{sgn}}(a{\mkern 1mu} + {\mkern 1mu} d){\mkern 1mu} - {\text{ sgn}}(a{\mkern 1mu} - {\mkern 1mu} d))/2 \hfill \\ \end{gathered} \right., $$
$$ {\text{fhan}}(x_{1} , \, cx_{2} , \, r_{0} , \, h_{0} ) \, = \, - r_{0} (a \, / \, d \, - {\text{ sign}}(a))s_{2} \, - \, r_{0} {\text{sign}}(a). $$
(6)

\({\text{fhan}}\) is a time-optimal solution that guarantees the fastest convergence from generated reference trajectory to desired trajectory [34]. The parameter \(r_{0}\) is called the tracking gain, which affects the rising speed of generated reference trajectory \(q_{i}\) and approximately determines the bandwidth of the TD. The parameter \(h_{0}\) is a speed factor that eliminates high-frequency output oscillations and is usually set higher than the controller instruction cycle \(h\), and \(c\) is the damping factor that determines the dynamic characteristic of the TD’s transient tracking process. These parameters can be adjusted individually according to the desired speed and smoothness.

The basic idea of the ESO is to estimate the integrated system disturbance \({\mathbf{f}}_{w}\), which includes unmodeled dynamics and unknown time-varying disturbances. The ESO uses the control input and system output to augment the system additional state. Considering robot dynamics (2), the integrated system disturbance \({\mathbf{f}}_{w}\) can be given by

$$ {\mathbf{f}}_{w} = - {\mathbf{D}}({\mathbf{x}}_{1} )^{ - 1} {\mathbf{d}}. $$
(7)

Augmenting \({\mathbf{f}}_{w}\) as a system additional state \({\mathbf{x}}_{3}\), system (2) can be expressed in the linear augmented state-space form as

$$ \left\{ \begin{gathered} \dot {\mathbf{x}}_{1} = {\mathbf{x}}_{2} \hfill \\ \dot {\mathbf{x}}_{2} = {\mathbf{x}}_{3} + {\mathbf{D}}^{{ - 1}} {\mathbf{u}} \hfill \\ \dot{\mathbf{x}} _{3} = \dot {\mathbf{f}} _{w} \hfill \\ {\mathbf{y}} = {\mathbf{x}}_{1} . \hfill \\ \end{gathered} \right.. $$
(8)

According to the above-mentioned ADRC design methodology, a third-order linear ESO can be designed to estimate the integrated system disturbance.

$$ \left\{ \begin{gathered} {\mathbf{e}} = {\mathbf{z}}_{1} - {\mathbf{y}} \hfill \\ \dot {\mathbf{z}} _{1} = {\mathbf{z}}_{2} - \beta _{1} \cdot {\mathbf{e}} \hfill \\ \dot {\mathbf{z}} _{2} = {\mathbf{z}}_{3} - \beta _{2} \cdot {\mathbf{e}} + {\mathbf{b}}_{0} \cdot {\mathbf{u}} \hfill \\ \dot {\mathbf{z}} _{3} = - \beta _{3} \cdot {\mathbf{e}}, \hfill \\ \end{gathered} \right. $$
(9)

where \({\mathbf{b}}_{0}\) is the estimated value of control amplification \({\mathbf{D}}^{ - 1}\), \({\mathbf{e}}\) is the estimate error of the joint angles, \({\mathbf{u}}\) is the control torque, \({\mathbf{y}}\) is the actual joint angle, \({\mathbf{z}}_{1} ,{\mathbf{z}}_{2} ,{\mathbf{z}}_{3}\) are the estimated states of \({\mathbf{x}}_{1} ,{\mathbf{x}}_{2} ,{\mathbf{x}}_{3}\), respectively, and \({{\varvec{\upbeta}}}_{1} ,{{\varvec{\upbeta}}}_{2} ,{{\varvec{\upbeta}}}_{3}\) are the diagonal observer gain matrices of the ESO. Defining \(\beta_{1i} \, = \, {{\varvec{\upbeta}}}_{1} (i,i), \, \beta_{2i} \, = \, {{\varvec{\upbeta}}}_{2} (i,i), \, \beta_{3i} \, = \, {{\varvec{\upbeta}}}_{3} (i,i), \, i \, = \, 1, \, 2, \, ... \, , \, n\), increasing \(\beta_{1i} ,\beta_{2i} ,\beta_{3i}\) can reduce the estimated error and accelerate the convergence. However, a greater \(\beta_{1i} ,\beta_{2i} ,\beta_{3i}\) means that the ESO is more sensitive to system noise. Furthermore, \(\beta_{1i} , \, \beta_{2i} , \, \beta_{3i}\) can be chosen as follows during preliminary design work by a pole-placement method [35]:

$$ \beta_{1i} \, = \, 3\omega_{oi} , \, \beta_{2i} \, = \, 3\omega_{oi}^{2} , \, \beta_{3i} \, = \, \omega_{oi}^{3} , $$
(10)

where the tuning parameter \(\omega_{oi}\) is the respective observer bandwidth.

Remark 1:

Considering a linear ESO system (9), if integrated system disturbance \({\mathbf{f}}_{w}\) is under the assumption that \({\mathbf{f}}_{w}\) is bounded and continuously differentiable, and the observer gains \(\beta_{1i} ,\beta_{2i} ,\beta_{3i}\) satisfy \(\beta_{1i} , \, \beta_{2i} , \, \beta_{3i} > 0,\) and \(\beta_{1i} \beta_{2i} \, > \, \beta_{3i}\), then the estimate errors are bounded [36]. This assumption is practical for a real robot system as the physical energy is limited and the mechanical system has a characteristic to filter the physical signal. We use these results as the stability constraints for ESO design in Sect. 4.

We now introduce the observing errors:

$$ {\mathbf{E}} \, = \, [{{\varvec{\upeta}}}_{1} , \, {{\varvec{\upeta}}}_{2} , \, {{\varvec{\upeta}}}_{3} ]^{T} \, = \, [{\mathbf{z}}_{1} \, - \, {\mathbf{x}}_{1} , \, {\mathbf{z}}_{2} \, - \, {\mathbf{x}}_{2} , \, {\mathbf{z}}_{3} \, - \, {\mathbf{x}}_{3} ]^{T} . $$

Then, the estimated states can be written as

$$ {\mathbf{z}}_{1} \, = \, {\mathbf{x}}_{1} \, + \, {{\varvec{\upeta}}}_{1} , \, {\mathbf{z}}_{2} \, = \, {\mathbf{x}}_{2} \, + \, {{\varvec{\upeta}}}_{2} , \, {\mathbf{z}}_{3} \, = \, {\mathbf{x}}_{3} \, + \, {{\varvec{\upeta}}}_{3} . $$
(11)

Under the assumption that \({\dot{\mathbf{f}}}_{w}\) is bounded, the bounds of \(\mathop {\lim }\limits_{t \to \infty } {\mathbf{E}}\) yield [37]

$$ \mathop {\lim }\limits_{{t \to \infty }} \left| {{\mathbf{E}}_{i} } \right|{\mkern 1mu} \le \left[ \begin{gathered} 0 \hfill \\ (1{\mkern 1mu} - {\mkern 1mu} \frac{1}{{\lambda _{i} }}{\mkern 1mu} + {\mkern 1mu} \frac{1}{{\lambda _{i}^{2} }})M_{i} \hfill \\ \frac{3}{{ - \lambda _{i} }}M_{i} \hfill \\ \end{gathered} \right] , $$
(12)

where \({\mathbf{E}}_{i}\) is the submatrix of \({\mathbf{E}}\) with respect to the i-th joint variables, for example, \({\mathbf{E}}_{1} = [z_{1} \, - \, x_{1} , \, z_{2} \, - \, x_{2} , \, z_{3} \, - \, x_{3} ]^{T}\) where \(z_{i} = {\mathbf{z}}_{i} (1),x_{i} = {\mathbf{x}}_{i} (1)\) are the corresponding states of joint 1. \(M_{i}\) is the i-th component of the upper bound \(\sup \left| {{\dot{\mathbf{f}}}_{w} } \right|\), and \(\lambda_{i}\) is the maximum real eigenvalue of the error system matrix \({\mathbf{E}}_{i}\).

Denote by \({{\varvec{\upsigma}}}_{3}\) the solutions of (12), where,

$$ {{\varvec{\upsigma}}}_{3} = \left[ {\frac{3}{{ - \lambda_{1} }}M_{1} , \, \frac{3}{{ - \lambda_{2} }}M_{2} , \, ... \, , \, \frac{3}{{ - \lambda_{n} }}M_{n} } \right]^{T} . $$
(13)

Then, we obtain,

$$ \sup \left| {{{\varvec{\upeta}}}_{3} } \right| \, \le \, {{\varvec{\upsigma}}}_{3} . $$
(14)

It should be noted that the bounds in Eq. (14) are a rather loose result, and that more accurate bounds can be obtained by assuming that disturbances that occur in engineering applications have typical forms. In addition, using appropriate numerical simulations can also help to determine more precise error bounds for a specific ESO.

Remark 2:

The system model and some prior knowledge can be eliminated from the augmented system state \({\mathbf{x}}_{3}\) when the corresponding parts are eliminated from the ESO input \({\mathbf{u}}\) simultaneously. This makes the ESO easily modifiable after previous design. Generally, having a more accurate system model and prior knowledge means fewer unknown uncertainties exist in the system. Thus, the ESO will produce more precise estimated results and require fewer observer gains or corresponding observer bandwidth.

The observer-based SMC law is designed to realize trajectory tracking. The sliding mode surface \({\mathbf{s}}\) for robot system (1) is given by

$$ {\mathbf{s}} \, = \, {\mathbf{c}}_{1} {\mathbf{e}}_{1} \, + \, {\mathbf{e}}_{2} , $$
(15)

where \({\mathbf{e}}_{1} = {\mathbf{q}}_{d} - {\mathbf{q}}_{a}\), \({\mathbf{e}}_{2} = {\dot{\mathbf{q}}}_{d} - {\dot{\mathbf{q}}}_{a}\) represents the state tracking errors. \({\mathbf{c}}_{1} = {\text{diag}}(c_{11} ,c_{12} ,...,c_{1n} )\), \(c_{11} ,c_{12} ,...,c_{1n}\) are constant sliding-mode surface parameters, and \(c_{11} ,c_{12} ,...,c_{1n} > 0\).

Then, the approach law is

$$ {\dot{\mathbf{s}}} \, = \, - {{\varvec{\upxi}}}{\text{sgn}} ({\mathbf{s}}) \, - \, {\mathbf{ks}}, $$
(16)

where \({{\varvec{\upxi}}} = {\text{diag}}(\xi_{1} ,\xi_{2} ,...,\xi_{n} )\), \(\xi_{1} ,\xi_{2} ,...,\xi_{n} > 0\) and \({\mathbf{k}} = {\text{diag}}(k_{1} ,k_{2} ,...,k_{n} )\), \(k_{1} ,k_{2} ,...,k_{n} > 0\).

Substituting (9), (15) and (16) from (1), the observer-based SMC law is designed as follows:

$$ {{\varvec{\uptau}}} \, = \, {\mathbf{D}}({\mathbf{c}}_{1} {\dot{\mathbf{e}}}_{1} \, + \, {\mathbf{\ddot{q}}}_{d} \, + \, {{\varvec{\upxi}}}{\text{sgn}} ({\mathbf{s}}) \, + \, {\mathbf{ks}}) \, + \, {\mathbf{C}}_{0} ({\mathbf{q}},{\dot{\mathbf{q}}}){\dot{\mathbf{q}}} \, + \, {\mathbf{G}}_{0} ({\mathbf{q}}) \, - \, {\mathbf{Dz}}_{3} \, + \, {\mathbf{f}}_{c} , $$
(17)

where \({\mathbf{C}}_{0} ({\mathbf{q}},{\dot{\mathbf{q}}})\) and \({\mathbf{G}}_{0} ({\mathbf{q}})\) are the nominal system models of \({\mathbf{C}}({\mathbf{q}},{\dot{\mathbf{q}}})\) and \({\mathbf{G}}({\mathbf{q}})\), and \({\mathbf{f}}_{c}\) is the estimated bound of the system error chosen as

$$ {\mathbf{f}}_{c} \, = \, {\mathbf{D}}(({{\varvec{\upsigma}}}_{3} \, + \, {\mathbf{f}}_{u} ) \, \odot \, {\text{sgn}} ({\mathbf{s}}) \, - \, {\mathbf{f}}_{l} ). $$
(18)

where \(\odot\) is the Hadamard product operator that represents the elementwise product of two matrices. \({\mathbf{f}}_{u}\) and \({\mathbf{f}}_{l}\) are the estimated upper bound and estimated lower bound of the initial states, respectively. Hence, \({\mathbf{f}}_{u} \ge {\mathbf{f}}_{l}\). A larger \({\mathbf{f}}_{c}\) can cause greater chatter when defined errors \({\mathbf{e}}_{1} ,{\mathbf{e}}_{2}\) arrive near sliding surface \({\mathbf{s}} = {\mathbf{0}}\). To obtain better control quality, we can include some decay factors \(\zeta_{i} (t)\) to revise the estimated \({\mathbf{f}}_{c}\) as

$$ {\mathbf{f}}_{c}^{^{\prime}} \, = \, {\mathbf{D}}((\zeta_{1} (t){{\varvec{\upsigma}}}_{3} \, + \, \zeta_{2} (t){\mathbf{f}}_{u} ) \, \odot \, {\text{sgn}} ({\mathbf{s}}) \, - \, \zeta_{3} (t){\mathbf{f}}_{l} ), $$
(19)

where \(\zeta_{1} (t)\) is monotonically decreasing, and \(\zeta_{2} (t),\zeta_{3} (t)\) can be chosen as a piecewise function for which \(\zeta_{2} (t),\zeta_{3} (t) = 0\) when \(t \ge t_{0}\). \(t_{0}\) is a given time.

Remark 3:

The system model and some prior knowledge can be eliminated from the augmented system state \({\mathbf{x}}_{3}\) when the corresponding parts are also eliminated from the ESO input \({\mathbf{u}}\) simultaneously. This makes the ESO easily modifiable after previous design. Generally, a more accurate system model and prior knowledge means that fewer unknown uncertainties exist in the system. Thus, the ESO produces more precise estimated results and requires fewer observer gains or corresponding observer bandwidth.

Summing up the above analysis, the DRSMC method illustrated in Fig. 1 can be obtained.

2.3 Stability analysis

Theorem 1:

Considering a robot system (1) under the bounded time-varying disturbances, the observer-based SMC scheme in the form of (9), (15), (16), and (17) is given. If the observer (9) has bounded estimate errors, the tracking error of system (1) will converge to the desired equilibrium point asymptotically.

Proof:

Combining (15) and (17), the derivative of the sliding surface (15) can be rewritten as follows:

$$ \begin{gathered} {\dot{\mathbf{s}}} \, = \, {\mathbf{c}}_{1} {\dot{\mathbf{e}}}_{1} \, + \, {\dot{\mathbf{e}}}_{2} \, = \, {\mathbf{c}}_{1} {\dot{\mathbf{e}}}_{1} \, + \, {\mathbf{\ddot{q}}}_{d} \, - \, {\mathbf{\ddot{q}}} \\ \, = \, {\mathbf{c}}_{1} {\dot{\mathbf{e}}}_{1} \, + \, {\mathbf{\ddot{q}}}_{d} \, - \, {\mathbf{D}}^{ - 1} ( - {\mathbf{f}} \, - \, {\mathbf{d}}) \, - \, {\mathbf{D}}^{ - 1} {{\varvec{\uptau}}} \\ \,\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad = \, - \, {{\varvec{\upxi}}}{\text{sgn}} ({\mathbf{s}}) \, - \, {\mathbf{ks}} \, - \, {\mathbf{D}}^{ - 1} ( - {\mathbf{f}} \, - \, {\mathbf{d}} \, - \, {\mathbf{C}}_{0} ({\mathbf{q}},{\dot{\mathbf{q}}}){\dot{\mathbf{q}}} \, - \, {\mathbf{G}}_{0} ({\mathbf{q}}) \, + \, {\mathbf{f}}_{c} ) \, + \, {\mathbf{z}}_{3} . \\ \end{gathered} $$
(20)

According to (11), we have

$$ {\dot{\mathbf{s}}} \, = \, - {{\varvec{\upxi}}}{\text{sgn}} ({\mathbf{s}}) \, - \, {\mathbf{ks}} \, - \, {\mathbf{D}}^{ - 1} {\mathbf{f}}_{c} \, + \, {{\varvec{\upeta}}}_{3} . $$
(21)

Consider the following Lyapunov function:

$$ V({\mathbf{s}}) \, = \, \frac{1}{2}{\mathbf{s}}^{T} {\mathbf{s}}. $$
(22)

The derivative of \(V({\mathbf{s}})\) yields

$$ \begin{gathered} \dot{V}({\mathbf{s}}){\mkern 1mu} = {\mkern 1mu} {\mathbf{s}}^{T} \mathop {\mathbf{s}}\limits^{.} {\mkern 1mu} = {\mkern 1mu} {\mathbf{s}}^{T} ( - \xi {\text{sgn}}({\mathbf{s}}){\mkern 1mu} - {\mkern 1mu} {\mathbf{ks}}{\mkern 1mu} - {\mkern 1mu} {\mathbf{D}}^{{ - 1}} {\mathbf{f}}_{c} {\mkern 1mu} + {\mkern 1mu} \eta _{3} ) \hfill \\ = {\mkern 1mu} - {\mathbf{s}}^{T} \xi {\text{sgn}}({\mathbf{s}}){\mkern 1mu} - {\mkern 1mu} {\mathbf{s}}^{T} {\mathbf{ks}}{\mkern 1mu} - {\mkern 1mu} {\mathbf{s}}^{T} ((\sigma _{3} {\mkern 1mu} + {\mkern 1mu} {\mathbf{f}}_{u} ){\mkern 1mu} \odot {\mkern 1mu} {\text{sgn}}({\mathbf{s}}){\mkern 1mu} - {\mkern 1mu} {\mathbf{f}}_{l} {\mkern 1mu} - {\mkern 1mu} \eta _{3} ) \hfill \\ = {\mkern 1mu} - \sum\limits_{{i = 1}}^{n} {\xi _{i} \left| {s_{i} } \right|} {\mkern 1mu} - {\mkern 1mu} \sum\limits_{{i = 1}}^{n} {k_{i} s_{i}^{2} {\mkern 1mu} } - {\mkern 1mu} \sum\limits_{{i = 1}}^{n} {\left| {s_{i} } \right|((\sigma _{3} )_{i} {\mkern 1mu} + {\mkern 1mu} ({\mathbf{f}}_{u} )_{i} {\mkern 1mu} - {\mkern 1mu} {\text{sgn}}(s_{i} )(({\mathbf{f}}_{l} )_{i} {\mkern 1mu} + {\mkern 1mu} (\eta _{3} )_{i} )} . \hfill \\ \le {\mkern 1mu} - \sum\limits_{{i = 1}}^{n} {\left| {s_{i} } \right|((\sigma _{3} )_{i} {\mkern 1mu} + {\mkern 1mu} ({\mathbf{f}}_{u} )_{i} {\mkern 1mu} - {\mkern 1mu} \left| {({\mathbf{f}}_{l} )_{i} } \right|{\mkern 1mu} - {\mkern 1mu} \left| {(\eta _{3} )_{i} } \right|)} \hfill \\ \le 0 \hfill \\ \end{gathered} $$
(23)

This means the defined errors \({\mathbf{e}}_{1} ,{\mathbf{e}}_{2}\) arrive at sliding surface \({\mathbf{s}} = {\mathbf{0}}\) in finite time. The sliding motion is then described as

$$ {\mathbf{c}}_{1} {\mathbf{e}}_{1} \, + \, {\mathbf{e}}_{2} \, = \, {\mathbf{0}}. $$
(24)

Since \({\mathbf{c}}_{1} > 0\), the system (24) can be verified as exponentially stable. This shows that the tracking error will slide to the equilibrium point asymptotically under the proposed DRSMC control law. This completes the proof.

3 Learning-based controller design methodology

3.1 Optimal design of DRSMC parameters

From the analysis in the previous section, we can see that the proposed DRSMC has much more control parameters than that of the PID controller. In summary, the parameters in the TD tracker are the speed factor \(r_{0}\), filter factor \(h_{0}\), and damping factor \(c\). The parameters in the ESO are \({{\varvec{\upbeta}}}_{1} , \, {{\varvec{\upbeta}}}_{2} , \, {{\varvec{\upbeta}}}_{3} ,\) and \({\mathbf{b}}_{0}\); and the parameters in the observer-based SMC law are \({\mathbf{c}}_{1} , \, {{\varvec{\upxi}}},\) and \({\mathbf{k}}\). The choice of different control parameters can greatly impact the closed-loop performance of the controlled system.

Among these parameters, speed factor \(r_{0}\), filter factor \(h_{0}\), and damping factor \(c\) can be easily chosen according to the rapidity requirement and the maximum acceleration that the system can actually provide. However, the other parameters need to be well-designed to obtain a satisfying result, and this design process is generally realized by utilizing empirical formulas or manual tuning.

Since the proposed robot controller directly controls the joint torques, we expect that the parameter tuning progress should be offline and automatic to reduce the potential risk of robot online operation. According to the separation principle [34], we can design the control parameters of the ESO and SMC law based on the system response. Then, these parameters are further adjusted according to the complete closed-loop system response.

In this paper, a genetic algorithm (GA) method [38] is used to optimize the residual control parameters offline according to the control objective function. This process only needs an excitation source that can achieve stable motion of the robot system. The excitation source we used here is a simple PID controller. Since the controller only needs to consider the system stability, it is very easy to design and implement. With an excitation source, we can let the robot track a group of desired trajectories and collect the control torques as well as the corresponding joint states including the joint angles, velocities, and torques. Note that the desired trajectories do not need to be well-designed when optimizing the control parameters; however, the actual joint trajectories should cover a relatively wide range of frequency spectra when training the actuator model [3].

The complete DRSMC parameter design process is as follows:

figure a

We want the used ESO can give fast and precise estimation with a certain degree of noise suppression ability, to optimize the ESO parameters in Eq. (8), we use an objective function as follows:

$$ \begin{gathered} J(n){\mkern 1mu} = {\mkern 1mu} \lambda _{1} J_{1} (n){\mkern 1mu} + {\mkern 1mu} \lambda _{2} J_{2} (n) \hfill \\ = {\mkern 1mu} \frac{{\lambda _{1} }}{T}\int\limits_{{t = 0}}^{T} {(\lambda _{{01}} t\left\| {\eta _{1} (t,n)} \right\|_{1} {\mkern 1mu} + {\mkern 1mu} \lambda _{{02}} t\left\| {\eta _{2} (t,n)} \right\|_{1} )dt} {\mkern 1mu} + {\mkern 1mu} \lambda _{2} \left\| {\frac{{{\mathbf{u}}(t,n)}}{{\lambda _{{03}} {\mathbf{b}}_{0} (n){\mathbf{z}}_{3} (t,n)}}{\mkern 1mu} + {\mkern 1mu} \frac{{\lambda _{{03}} {\mathbf{b}}_{0} (n){\mathbf{z}}_{3} (t,n)}}{{{\mathbf{u}}(t,n)}}} \right\|_{\infty } . \hfill \\ \end{gathered} $$
(25)

where \({{\varvec{\upeta}}}_{i} (t,n),{\mathbf{z}}_{3} (t,n),{\mathbf{b}}_{0} (n),{\mathbf{u}}(t,n)\) represent the ESO estimate errors, augmented state, control amplification, and control torque for joint n, respectively; \(T\) is the simulation running time; and \(\lambda_{1} ,\lambda_{2} ,\lambda_{01} ,\lambda_{02} ,\lambda_{03}\) are the given weighted factors.

In Eq. (25), \(J_{1}\) is the optimization objective part, which reflects the ESO ability for fast and precise estimation. \(J_{2}\) is the regularization part, which is designed to improve the disturbance suppression ability of the obtained ESO and also prevent the parameters from increasing boundlessly in the optimization process. Then, the optimization problem can be solved by using the GA method under the parameter stability constraints.

For control law design, we want the controlled system can have a fast and smooth transient period with small steady-state error. Similarly, the following objective function is constructed to optimize the SMC law parameters in Eq. (17) offline:

$$ J_{e} (n) \, = \, \lambda_{1} J_{1} (n) \, + \, \lambda_{2} J_{2} (n) \, + \, \lambda_{3} J_{3} (n). $$
(26)

where

$$ \begin{gathered} J_{1} (n) \, = \, \frac{1}{T}\int\limits_{t = 0}^{T} {(\varepsilon_{01} \left\| {{\mathbf{e}}_{1} (t,n)} \right\|_{2} \, + \, \varepsilon_{02} \left\| {{\mathbf{e}}_{2} (t,n)} \right\|_{2} )dt} \hfill \\ J_{2} (n) \, = \, \max (\left| {\frac{{{\mathbf{e}}_{1} (t,n)}}{{{\mathbf{q}}_{d} (t,n)}}} \right|){\text{ , in }}t \in [T_{0} \, , \, T]. \hfill \\ J_{3} (n) \, = \, \overline{{\left\| {{\mathbf{e}}_{1} (t,n)} \right\|_{1} }} {\text{ , in }}t \in [T - \Delta T \, , \, T] \hfill \\ \end{gathered} $$
(27)

In Eq. (27), \(J_{1}\) reflects the general control ability considering the complete tracking process; \(J_{2}\) represents the maximum relative tracking error for \(t > T_{0}\), \(T_{0}\) is a given time to eliminate insensitive errors in the initial tracking; \(J_{3}\) represents the steady-state error; and \(\lambda_{1} , \, \lambda_{2} , \, \lambda_{3} , \, \varepsilon_{01} ,{\text{ and }}\varepsilon_{02}\) are the given weighted factors.

When optimizing the SMC law parameters, we found that the given desired trajectory must make a difference after several iterations in optimization. Otherwise, the obtained control parameters make the robot performance sensitive to the desired input, which means the system can only achieve good performance in tracking this training trajectory and lacks robustness to disturbances and unknown uncertainties. Changing the trajectory amplitude and using different trajectory forms in optimization can help to obtain robust parameters.

Note that this design methodology is a general framework that can be used for other controllers such as PID, ADRC, and SMC. The offline tuning result can also provide the guidance for online tuning with the optimal performance would get.

3.2 Modeling the actuation ability

In this paper, the actuation ability (see Fig. 2) is the control-to-torque relationship that includes all communication delays, current-loop ability, measurement noise, and hardware dynamics within one control loop. As an analytical actuation model is extremely difficult to describe [3], we used supervised learning to train an actuation network that outputs an estimated torque at the joints given a history of command control torques and the joint velocities. Note that this obtained actuation network is only used in simulations placed between the control output and the joint input torque to make the simulation model more realistic. This MLP network can provide lightly jitter output for controller training, that can prevent premature convergence and improve the robustness of obtained controller. We assumed that the joint actuations are independent of each other; hence, we trained the network of each joint separately.

Fig. 2
figure 2

Actuation ability and training of actuation network

More precisely, we used an MLP with 4 hidden layers of 16 units each, as shown in Fig. 2. A history consisting of the last 10 sampling periods of the command control torques and joint velocities that correspond to each training data has a history of the last 0.01-s command control torques and joint velocities in this work. The length of each training data should be neither too long nor too short, as a too-dense history can make the training model more prone to overfitting and computationally more expensive. In addition, the length of the history should be sufficiently longer than the system communication delays and the mechanical response time, which is about three to four sampling periods in our system. For the features that were chosen, we found that the joint angle information is of no help in training this actuation network. By contrast, the joint velocity information is a necessary feature in this problem.

The dataset contains more than 500,000 samples, which are mentioned in the above section. About 80% of the generated data were used for training, and the rest were used for validation. We choose the commonly used ELU (Exponential Linear Unit [39]) activation function in the MLP network. The root mean square (RMS) of the prediction error is used to evaluate the trained actuation network. Training one network takes about 3 h on one NVIDIA 12G GTX1050 Ti GPU.

The validation result with the obtained actuation network is shown in Fig. 3, where the ideal model has a zero communication delay, zero mechanical response time, and infinite bandwidth. Hence, the model can generate any commanded torque instantly. It can be observed that the trained model can simulate the dynamic performance of the torque response and has an average absolute error of 0.297 Nm on the validation set, which is lower than that of the ideal model (0.578 Nm). Although the static torque performance cannot be predicted precisely (see the curve before the step occurs in Fig. 3), this is reasonable and acceptable for the simulation and controller design. Moreover, the trained actuation network can also add structural noise to the simulation system, which is considered an effective way to improve the training of a model [40].

Fig. 3
figure 3

Validation of learned actuation network for commanded torque response

We should note that the simulated system can hardly represent the real system perfectly, the modeling errors, machining errors, the joint flexibility, the dynamics of joint gear reducer and other unmodeled dynamics would constrain the model’s accuracy. Therefore, the controller in simulation would suffer more constraints in the real robot system and the learning based offline training would get parameters which are not robust enough if we only use the robot model built by the MATLAB/SimMechanics toolbox. The MLP network can simulate the actuation ability (see Fig. 2) for training our controller, besides, this MLP network can provide lightly jitter output for controller training, that can prevent premature convergence and improve the robustness of obtained controller.

The controller parameters can be finally acquired by offline optimization using the trained actuation network. The performance of the obtained controller in simulation and experiment can be seen in Fig. 4. The settling time in the experiment is about 0.412 s, while that in the simulation is about 0.249 s. The overshoot in the experiment is about 0.001°, while that in the simulation is about 0.003°. As can be seen in Fig. 4, we can get a quite effective controller in simulation without using the actuation network; however, this controller may not obtain a satisfactory performance in the real robot. We can also see that using the actuation network in training can help to obtain a more robust controller, the simulation performance is similar and this controller performs much better in the real system. We can observe that the proposed learning-based autonomously design methodology is practical and effective for controlling a complex robot system.

Fig. 4
figure 4

Performance of trained controller without using the actuation network (left) and performance of the controller using the actuation network in training (right)

4 Simulation and experimental results

In this section, the proposed control method is validated by simulation examples and experimental studies. Four comparative control strategies (conventional PID controller, conventional SMC method [41], linear ADRC method [31], and the proposed DRSMC method) were tested. Since the built-in PID controller cannot track the step input, we set the built-in parameters as initial values in PID optimization. All of these controllers’ parameters are optimally tuned by using the proposed methodology in Sect. 4. In this particular case, the weighted factors in Eq. (2527) are chosen as

$$ \begin{gathered} \lambda_{1} = 2, \, \lambda_{2} = 1, \, \lambda_{01} = 2, \, \lambda_{02} = 1, \, \lambda_{03} = 3{\text{ , in ESO optimization}} \hfill \\ \lambda_{1} = 1, \, \lambda_{2} = 10, \, \lambda_{3} = 2000, \, \varepsilon_{01} = 2, \, \varepsilon_{02} = 1{\text{ , in control law optimization}}{.} \hfill \\ \end{gathered} $$

The primary parameters in the GA method are chosen as follows: population size is 400, maximum number of iterations is 400, crossover fraction is 0.8, mutation fraction is 0.2, and migration fraction is 0.2.

We note that the proposed method would not increase the computational burden obviously compared with the conventional PID method and SMC method [42, 43], the complete algorithm can be run well with 1 kHz frequency.

To realize feedforward compensation, we conduct dynamic identification for our robot using the method detailed in [44]. The joint friction is modeled as the following Coulomb viscous friction:

$$ \tau_{f,n} \, = \, {\mathbf{f}}_{c1} {(}n{\text{)sgn}}(\dot{q}_{n} ) \, + \, {\mathbf{f}}_{c2} (n)\dot{q}_{n} , $$
(28)

where \(\tau_{f,n}\) represents the joint friction of joint n, and \({\mathbf{f}}_{c1} ,{\mathbf{f}}_{c2} \in {\mathbb{R}}^{n \times 1}\) are the friction parameters.

The bandwidth of the external disturbances was tested to guide the design of simulations and experiments. In this case, we set the robot in zero-force mode, which means we only provide gravity compensation and manually drag the robot to move several trajectories. The frequency spectra of the joint torques under human force were analyzed to reveal the characteristics of potential external disturbances in our system. The spectrum of one joint torque can be shown in Fig. 5, in which the power spectrum decreases by about 3 dB below the 0-Hz value when the frequency is 0.5 Hz, decreases by about 10 dB when the frequency is 1.0 Hz, and decreases by more than 20 dB when the frequency is 4.0 Hz, which means more than 90% of the power caused by external effect in joint space is centralized with the bandwidth of 4.0 Hz. This means external disturbances applied to the robot mainly generate disturbance torques with the bandwidth of 4.0 Hz in joint space. As all of these spectra have similar characteristics, the bandwidth of external disturbances in robot joint space can be approximately considered as 4.0 Hz.

Fig. 5
figure 5

Power spectrum of joint torque when external force is applied

4.1 Simulation results and discussion

The robot tracking process is simulated in the MATLAB/Simulink environment. The robot model of a 6-DOF manipulator (Fig. 9, which is used in our experiments) is set up using the MATLAB/SimMechanics toolbox, and the model physical parameters are set according to the given robot URDF file. The friction parameters of each joint in Eq. (28) are set according to the system dynamic identification results:

$$ \begin{gathered} {\mathbf{f}}_{c1} \, = \, [{5}{\text{.6665}},{ 2}{\text{.951}},{ 2}{\text{.7750}},{ 2}{\text{.9656}},{ 1}{\text{.4458}},{ 1}{\text{.5185}}], \hfill \\ {\mathbf{f}}_{c2} \, = \, [{10}{\text{.4242}},{ 13}{\text{.1298}},{ 9}{\text{.6565}},{ 3}{\text{.5454}},{ 2}{\text{.4864}},{ 2}{\text{.0506}}], \hfill \\ \end{gathered} $$

To simulate the noise and communication delays in measurements, the feedback joint angles of the sensor were corrupted by zero-mean white noise with a standard deviation of 0.001° and 0.001 s of time delay. According to the robot manual, the joint torques are limited to corresponding rated torques of 85 Nm, 85 Nm, 40 Nm, 40 Nm, 10 Nm, and 10 Nm, respectively.

First, we compare the control performance of the control methods mentioned above. In this simulation case, all joints are expected to track the reference square-wave signals, which have an amplitude greater than 10° (considered a wide range for robot to track), simultaneously. No extra disturbance torque is applied to the system in this case. Figure 6 shows the tracking curves of the joint angles under these comparative controllers. All joints of these controllers were optimally tuned using the same methodology in Sect. 4, and a description of the joints can be seen in Fig. 9.

Fig. 6
figure 6

Tracking trajectories of robot under different control methods

As can be seen in Fig. 6, the PID controller exhibits good performance in rapidity but has a certain overshoot and relatively long settling time. The LADRC method also shows good performance in rapidity but has mostly no overshoot. The SMC method has a smooth tracking performance but the rising speed is a little slow. The DRSMC method provides the best tracking performance considering the rapidity and steady-state errors, which demonstrates that the proposed control strategy can achieve promising tracking performance. The control torque (or control input) of comparative controllers are shown in Fig. 7.

Fig. 7
figure 7

Control torque (control input) of the comparative controllers

To illustrate the robustness of these controllers against unknown time-varying disturbances applied to the robot, a disturbance torque is applied to each joint simultaneously. The controller parameters remain the same, and corresponding reference trajectory is a sine wave (amplitude is 11.46°, and frequency is 1.0 Hz). We consider that a time-varying disturbance is the superposition of a sine-wave disturbance and constant disturbance, as follows:

$$ dis_{i} ({\text{Nm}}){\mkern 1mu} = {\mkern 1mu} \left\{ \begin{gathered} 0,\quad t < t_{1} \hfill \\ Tq_{i} {\mkern 1mu} + {\mkern 1mu} Tq_{i} \sin (2\pi f_{{dis}} (t{\mkern 1mu} - {\mkern 1mu} t_{1} )){\text{,}}\;\,{\text{else}} \hfill \\ \end{gathered} \right. $$
(29)

where \(Tq_{i}\) is chosen as a 25% value of the maximum i-th joint output torque. The sine-wave disturbance frequency \(f_{dis}\) is chosen as 4 Hz, which is the bandwidth of potential external disturbances in our system (see Fig. 5). We should note that the higher frequency of disturbance applied, the larger the tracking error would have; here we give the results of a disturbance with 4 Hz since it is the upper limit frequency of disturbances, which provides the worst condition for the robot controller.

The trajectory tracking curves of the joint errors against unknown time-varying disturbances are shown in Fig. 8, where the disturbance is applied at \(t=4.0\mathrm{ s}\). We can observe that the DRSMC method provides the least peak-valley error and average error after a short settling time of suppressing disturbances. The simulation results verify that the proposed DRSMC method can adequately suppress a wide range of unknown time-varying disturbances compared to the PID, LADRC, and SMC methods.

Fig. 8
figure 8

Tracking trajectories of robot against unknown time-varying disturbances under different control methods

4.2 Experimental results and discussion

The robot used in the experiments is a 6-DOF Elfin collaborative robot, as shown in Fig. 9. A PC-based controller is implemented to process data and control the robot directly via an EtherCAT bus. The real-time (RT) running frequency of the robot system is set to 1 kHz (the bus cycle is 1 ms). A control algorithm is employed using the ROS environment under an RT-Linux core. The parameters of the PID controller used in the data acquisition phase in Sect. 2 are set as P = 10, I = 0.1, and D = 5.

Fig. 9
figure 9

Illustration of 6-DOF experimental robot system

In the first experiment, we compare the control performance of different control methods with no extra disturbance torque applied to the system. All joints are expected to simultaneously track the reference square-wave signals in a wide range simultaneously. All controllers are further tuned with proper gains to provide a good performance with regard to the closed-loop tracking response considering a compromise between the response rapidity and steady-state error. The experimental results are shown in Fig. 10. In these figures, the dotted lines represent the reference trajectory of each joint, and the solid lines represent the joint tracking trajectories using comparative control strategies.

Fig. 10
figure 10

Tracking trajectories of experiment robot under different control methods

Table 1 Step-response performance characteristics of four comparative control strategies

As can be seen in Fig. 10, the conventional PID controller can barely control the robot to track the trajectories with a large amplitude step change. This occurs because the required steady-state error and system stability restricted the parameters’ tuning range. Generally speaking, utilizing the ADRC control scheme or the SMC approach can improve the system control performance in contrast to the PID controller. For further discussion, four representative step-response performance characteristics are calculated to compare these control strategies and are given in Table 1. \(t_{r} , \, t_{s} , \, \sigma \% ,{\text{ and }}e_{ss}\) are the average rise time (here, the rise time represents the first time that the response curve arrives at 90% of the stable-state value), average settling time within the 2% error band, average percentage overshoot, and average absolute steady-state error of the six robot joints, respectively. These first three characteristics \(t_{r} , \, t_{s} , \, \sigma \%\) show the dynamic performance and the characteristic \(e_{ss}\) demonstrates the static performance. The control torque (or control input) of comparative controllers are shown in Fig. 11.

Table 2 Disturbance rejection ability of four comparative control strategies
Fig. 11
figure 11

Control torque (control input) of the comparative controllers

The results demonstrate that the LADRC method can greatly reduce the system settling time (by about 58.3%) and percentage overshoot (by about 29.6%) compared with the PID controller. Meanwhile, the steady-state error is reduced by about 27.6%. This shows that the usage of TD and ESO can balance the response rapidity and steady-state error because the discontinuous reference trajectories can be smoothed by TD to bound the calculated errors, and the ESO provides estimation and compensation of unknown disturbances. However, further decreasing the system steady-state error on the basis of maintaining a satisfying response rapidity can be difficult under the LADRC method since the ESO cannot accurately estimate and compensate the joint static friction. Then, the problem becomes similar to that for the PID controller.

The SMC method exhibits better tracking performance than LADRC method and can reduce the system settling time by about 21.5% and decrease the steady-state error by about 26.7%. The reason is that the designed sliding motion guarantees the control performance of the nominal system. The proposed DRSMC method has the best tracking performance with regard to the response rapidity and steady-state error and can have a similar rise time as the PID controller, which approximates the shortest rise time our system can provide. Meanwhile, its settling time is also the shortest and approximates the rise time, which indicates that the corresponding transient period is smooth and steady. In addition, the DRSMC method mostly exhibits no overshoot or steady-state error owing to the sliding motion and the ESO compensation.

The results of the contrast experiments using the above four controllers against unknown time-varying disturbances are shown in Fig. 12. The representative characteristics are calculated in Table 2, where \(M_{p} , \, M_{p - p} ,\) and \(M_{a}\) represent the average maximum value of the joint error, average peak-valley error in one cycle, and average joint error in 10 stable cycles of the 6 robot joints after disturbances are applied, respectively. The corresponding disturbances are given as Eq. (29), where the disturbance frequency is chosen as 4 Hz, and \(t_{1}\) is chosen as 0.5 s. It should be noted that the disturbances applied here are comparatively significant, with a peak value of 50% of the rated torque with the upper-bound disturbance frequency. In addition, all controller parameters here remain the same as in the first experiment. The control torque (or control input) of comparative controllers are shown in Fig. 13.

Fig. 12
figure 12

Tracking trajectories of joint errors against unknown time-varying disturbances under different control methods

Fig. 13
figure 13

Control torque (control input) of the comparative controllers

From Fig. 12 and Table 2, we can observe that the PID controller has the largest error caused by applied dynamic disturbances and the largest stable error mainly caused by constant disturbances. This shows that it is difficult for the PID controller to suppress strong time-varying disturbances regardless of whether they are dynamic or constant. Compared with the PID method, the LADRC method can effectively suppress the time-varying disturbances, especially constant disturbances. According to (14), we know that the ESO can estimate bounded disturbances and precisely estimate the constant disturbance. Therefore, the LADRC method can exhibit good performance despite unknown disturbances.

The SMC method can greatly decrease the maximum error caused by dynamic disturbances since the sliding motion along the sliding surface can rapidly reduce the joint error after disturbances are applied. However, its stable error is larger than that of the LADRC method. The proposed DRSMC method can combine the advantages of the LADRC method and SMC method, which can largely suppress both dynamic disturbances and constant disturbances. Figure 14 illustrates the suppressing disturbance ability of the DRSMC method under a given disturbance bandwidth. The disturbance frequencies in \(d_{1} , \, d_{2} , \, d_{3} ,\) and \(d_{4}\) are chosen as 0 Hz, 0.5 Hz, 1 Hz, and 4 Hz.

Fig. 14
figure 14

Tracking trajectories of joint errors against different disturbance frequencies under DRSMC method

As shown in Fig. 14, the DRSMC method has a great ability for disturbance rejection over the disturbance frequency range; furthermore, the DRSMC can greatly suppress low frequencies, especially static disturbances. The joint tracking error can be reduced to less than 0.01° over the − 3 dB bandwidth even when the disturbance has a peak value of 50% of the rated torque. These experimental results prove that the proposed DRSMC method has a strong ability for unknown time-varying disturbance rejection and good application potential.

5 Conclusions

A practical and effective trajectory tracking control framework with a strong disturbance rejection ability for robots was presented in this paper. By combining the active disturbance rejection scheme, an ESO-based SMC law was developed to realize effective trajectory tracking while actively estimating and compensating unknown disturbances and system uncertainties simultaneously. A learning-based controller design methodology was introduced to realize the optimal design of the proposed controller, and an autonomous learning method was developed for transferring the robot joint actuation ability.

To obtain a robust and transferring controller, a neural network was used to learn the joint actuation ability for the controller optimizing process. Simulation results and experimental results verified that the proposed controller design methodology is effective and robust, and the proposed control strategy can achieve satisfying tracking performance with a strong disturbance rejection ability. As an extension of this research project, future work will develop a scheme to adaptively adjust the controller online and use more feedback information to create a control policy for certain applications.