Abstract
This paper concludes a robust optimal tracking control law for a class of nonlinear systems. A characteristic of this paper is that the designed controller can guarantee both robustness and optimality under nonlinearity and mismatched disturbances. Optimal controllers for nonlinear systems are difficult to obtain, hence a reinforcement learning method is adopted with two neural networks (NNs) approximating the cost function and optimal controller, respectively. We designed weight update laws for critic NN and actor NN based on gradient descent and stability, respectively. In addition, matched and mismatched disturbances are estimated by fixed-time disturbance observers and an artful transformation based on backstepping method is employed to convert the system into a filtered error nonlinear system. Through a rigorous analysis using the Lyapunov method, we demonstrate states and estimation errors remain uniformly ultimately bounded. Finally, the effectiveness of the proposed method is verified through two illustrative examples.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The objective of optimal tracking control is to develop a controller that ensures the system’s output tracks a specified reference signal, while minimizing a specific performance index. This field has earned significant attention and research, finding applications in practical domains such as chaotic systems, helicopters, permanent magnet synchronous motors, dispatch and electric vehicles [1,2,3,4,5]. Optimal control techniques rely on the principles of Pontryagin’s minimum principle. In the case of linear systems, the optimal control involves solving the algebraic Riccati equation, as suggested in the work by [6]. For the nonlinear systems, the optimal control necessitates the solution of the nonlinear Hamilton-Jacobi-Bellman (HJB) equation. Despite the practical utility of optimal control, the conventional methodology encounters a significant challenge, namely, the difficulty of solving the nonlinear HJB equation for higher-order systems [7,8,9,10].
In recent years, numerous efforts have been made to obtain the optimal controller, including inverse optimal control, \(\theta\)-D techniques, numerical approximation methods, and others [11, 13, 14]. The inverse optimal control method, presented in [11, 12], offers a solution that avoids the need to solve the HJB equations. For nonlinear systems, a suboptimal control approach was proposed in [13]. Another approach, described in [14], employed a \(\theta\)-D approximation method to solve the HJB equation by transforming it into state-dependent Lyapunov equations. It is important to note that these methods, although effective, are typically performed offline. Consequently, when there are changes in the system parameters, there may be fluctuations in the control effectiveness. To address this issue, researchers have explored the integration of reinforcement learning and adaptive control with optimal control [7, 15,16,17,18,19,20,21].
Approximate dynamic programming (ADP), proposed by [7] in 1992, utilizes function approximation structures to approximate the cost function and control strategy in the dynamic programming equation. ADP has been developed in subsequent works [15,16,17] using neural networks (NNs) to achieve optimal tracking control. These methods have been thoroughly studied and widely adopted [18, 24]. Furthermore, advancements in hardware have paved the way for data-driven approaches in optimal control. For example, [22] introduced a computational adaptive optimal controller for linear systems with completely unknown dynamics. Nonlinear adaptive optimal control was achieved through value iteration and ADP, as described in [23].
Inspired by this, we have incorporated the principles of adaptive and reinforcement learning to develop efficient tracking controllers using an actor-critic approach. Nevertheless, previous studies such as [25, 26] have highlighted a limitation of optimal tracking control, which involves the introduction of a discount factor into the performance index. This factor is intended to prevent the index from growing indefinitely, but it can hinder the convergence of the system state to zero. To address this issue, our paper proposes a reinforcement learning-based tracking control technique that utilizes a filtered error system, thereby eliminating the need for a discount factor.
In practical systems, the presence of disturbances is an inevitable issue [27, 28, 35]. These disturbances encompass both internal environmental factors, such as unmodeled dynamics, perturbed model parameters, and structural perturbations, as well as external environmental disturbances [37]. To achieve desired control outcomes, including improved disturbance rejection, fast dynamic response, and minimal steady-state error, it is crucial to explore high reliability controllers. Extensive research has been conducted on various anti-disturbance control methods, such as robust control [29], sliding mode control [30, 31], and output regulation theory [32]. Among these methods, two approaches have gained attention for their ability to achieve fast disturbance suppression based on system dynamics: disturbance observer-based control and active disturbance rejection control [33,34,35]. By employing disturbance observers or extended state observers to estimate and actively compensate for disturbances, their influence can be effectively mitigated [35].
However, mismatched disturbances are difficult to handle, as highlighted in [36, 37]. In [37], the authors proposed a composite control strategy based on the backstepping method for higher-order nonlinear systems with non-vanishing disturbances. By incorporating estimation information of the disturbance at each step of the virtual control, output is regulated to 0. While this method effectively handles mismatched disturbances, it is not optimal due to two reasons. Firstly, nonlinearity is subtracted at each step of the virtual control process. Secondly, the gain of the virtual control is artificially assigned and only satisfies the condition for making the derivative of the Lyapunov function negative definite. Therefore, we employ the concept of backstepping to construct a filtered error system that retains the nonlinear terms, ensuring optimality in dealing with mismatched disturbances.
Furthermore, the majority of existing studies focus on achieving asymptotic estimates of disturbances, implying that estimation errors persist even as the system converges. To mitigate the impact of disturbances, researchers have proposed fixed-time observers [38,39,40]. This approach involves estimating unknown disturbances within a predetermined time period, thereby minimizing their subsequent effects. In our study, we also employ a fixed-time disturbance observer (FTDOB) to estimate disturbances and reduce their influence on the neural network training process.
Therefore, this paper aims to address the limitations of existing optimal control methods and anti-disturbance methods in order to tackle more complex scenarios. The primary contributions of this paper are as follows:
-
Two neural networks are utilized to implement an actor-critic network, enabling the approximation of both the optimal control and cost function.
-
The fixed-time algorithm is employed in the design of the observer, allowing for the estimation of disturbances over a predetermined time interval, thereby enhancing the reliability of the control strategy.
-
Filtered error systems are constructed to attain an optimal controller for high-order nonlinear systems affected by mismatched disturbances.
The rest of the paper are organized as follows. In Sect. 2, system description and some necessary definitions are given. Section 3 concludes the main results about disturbance observer design and controller design. Simulation examples are given in Sect. 4 and conclusion is given in Sect. 5.
2 System descriptions and some preliminaries
Consider the following disturbed nonlinear system,
where \(x_{i}\), \(d_{i}\), \(f_{i}\), \(i=1, 2,\ldots , n\) denote system states, disturbances and nonlinear functions, u is the control input. Assuming complete state information is available.
Assumption 1
Assuming there exists a small enough constant \(\xi\) such that \(\Vert {\dot{d}}\Vert <\xi\).
Here, we recall the optimal control theory [6]. For the nominal system, i. e., we do not consider the disturbance here, a cost function is given as
where Q(x) is positive definite function and R is symmetric positive definite constant matrix. Define \(\frac{\partial J}{\partial x}=\nabla J\) and choose the Hamilton function as \(H= \nabla J^{\textsf {T}}{\dot{x}}+Q+u^{\textsf {T}}Ru\). Then, optimal value function \(J^{*}\) meets \(0= \min _{u}[H(x,u,\nabla J^{*})]\). With optimal control policy \(u^{*}\), the HJB equation becomes
Then, we have the optimal control input \(u^{*}\) as
The existing optimal control methods faces two challenges: (1) robustness in the presence of disturbances, especially in the presence of mismatched disturbances; (2) complex nonlinear HJB equation, given that the solution is very resource-intensive. Hence, we proposed a robust optimal control strategy based on NNs and disturbance observers, which will be detailed given in Sect. 3. Next, we provide one definition for the latter process.
Definition 1
The equilibrium \(x_{e}\) of system (1) is uniformly ultimately bounded (UUB) if there is a compact set \(S\subset {\mathbb {R}}^{n}\), and for any initial value \(x_{0}\) that belongs to that compact set, initial time \(t_{0}\), there is an upper bound B and a time \(T(B,x_{0})\) such that \(\Vert x(t)-x_{e}\Vert \le B\) for all \(t>t_{0}+T\).
3 Main results
The classic control method usually adopts the idea of feedback control plus feedforward control [35], but it has the following two shortcomings: (1) The asymptomatically convergent observer will cause the estimation error to persist. (2) Feedback control can only stabilize the system with not optimality. This paper avoids these shortcomings by fusing fixed-time estimation with reinforcement learning. The accompanying Fig. 1 visually represents the core concepts discussed in this paper. The output of the system is directly used as the input of the disturbance observer. By choosing the observer gain reasonably, the complete tracking of the disturbance can be realized in any fixed time. Then, the original with disturbance estimation is transformed into a filter error system, which enables us to deal with mismatched disturbance well. Under the framework of optimal control, reinforcement learning methods relying on actor and critic NNs are proposed. By training the NN, the optimal controller of the error system is obtained.
Firstly, we design the fixed-time disturbance observers. With the disturbance estimation in hand, a filtered error system is then transformed.
3.1 Fixed-time disturbance observer design
The fixed-time disturbance observer is designed for each channel as
where \(i=1, 2,\ldots , n\). \(z_{i1}\), \(z_{i2}\) are estimations of \(x_{i}\) and \(d_{i}\), \(\lambda _{1}\), \(\lambda _{2}\), \(\lambda _{3}\), \(\lambda _{4}\) are observer gains to be designed, \(\alpha _{1}\), \(\alpha _{2}\), \(\beta _{1}\), \(\beta _{2}\) are observer internal parameters.
Theorem 1
Given system (1) if the observer gain is chosen properly, the disturbance can be estimated in a fixed time \(T_{d}\), which is independent of the initial values.
Proof
Define the estimation error as \(e_{i1}=x_{1}-z_{i1}\), \(e_{i2}=d_{i}-z_{i2}\). Derivation of \(e_{i1}\) and \(e_{i2}\) along time gives
As long as the observer gain is chosen carefully, then the estimation error is fixed-time convergent, and can be written as \({\dot{e}}=\Lambda (e)+D\), \(D=[0,\quad {\dot{d}}_{i}]^{T}\). The rest proof is similar to [31] and is omitted here. \(\square\)
Under the designed observer, the mismatched disturbance can be handled. With the help of backstepping method, the filtered error is obtained as \({\dot{z}}_{1}=x_{2}+f_{1}+d_{1}-{\dot{r}}\), where r is reference signal. Here, we denote \(z_{2}=x_{2}-x_{2}^{*}\), choose \(x_{2}^{*}=-k_{1}z_{1}-{\hat{d}}_{1}+{\dot{r}}\), then \({\dot{z}}_{1}={\dot{x}}_{1}-{\dot{r}}=z_{2}-k_{1}z_{1}+f_{1}+e_{1}\). Likewise, we have
Then (7) is rewritten as
where \(Z=[z_{1},z_{2},\ldots ,z_{n}]^{\textsf {T}}\), \(F(Z)=[z_{2}+f_{1}-k_{1}z_{1}, \cdots , f_{n}-k_{n}z_{n}]^{\textsf {T}}\), \(G=[0,0,\ldots , 1]^{T}\).
Remark 1
Subtracting the nonlinear in backstepping method will lead to a nonoptimal controller as the nonlinearity may be actually beneficial in meeting the stabilization and/or performance objectives [11].
Remark 2
During the actual production process, the controlled system often encounters abrupt disturbances that can be characterized as lumped disturbance [35]. These types of disturbances do not satisfy the assumption we initially made (referred to as Assumption 1). Nevertheless, the proposed control strategy exhibits the capability to stabilize the system and demonstrates a certain level of robustness. This is attributed to the fact that even in the presence of sudden disturbance changes, the designed observer is able to estimate the disturbance at a fixed time. It is worth noting that the nonlinear function employed in the controller design is represented as \(f+e\). However, since the term e exists only momentarily and eventually diminishes to zero, the overall effect on the controller’s performance is minimal.
According to the former section, we define \(\frac{\partial J}{\partial Z}=\nabla J\) and the Hamilton function is chosen as \(H= \nabla J^{\textsf {T}}{\dot{Z}}+Z^{\textsf {T}}QZ+u_{o}^{\textsf {T}}Ru_{o}\). Then, we have the optimal control as \(u^{*}= \arg \min \limits _{u_{o}}[H(Z,u_{o},\nabla J^{*})]=-\frac{1}{2}R^{-1}g^{\textsf {T}}\nabla J^{*}\), satisfying \(0= Q+u_{o}^{*\textsf {T}}Ru_{o}^{*}+\nabla J^{*\textsf {T}}(F+Gu_{o}^{*})\).
3.2 Critic NN design
The cost function is approximated by a critic neural network,
where W denotes the ideal neuron weights, \(\phi (Z): R^{n}\rightarrow R^{N}\) is the NN activation function vector, N stands for the number of neurons in the hidden layer, \(\epsilon (Z)\) is the approximation error. As \(N\rightarrow \infty\), it has \(\epsilon (Z)\rightarrow 0\). As a result of the unknown nature of neural network weight W, the output of the neural network can be expressed as
where \({\hat{W}}_{c}\) is the estimation of W.
Considering (9) and (10), the corresponding Hamilton functions are rewritten as
and
where \(\upsilon _{H}=\nabla \epsilon (F+Gu_{o})\).
Define critic NN approximation error \({\tilde{W}}_{c}=W-{\hat{W}}_{c}\), then we have
Given any admissible control policy, it is desired to select \({\hat{W}}_{c}\) to minimize the quadratic error
The normalized gradient algorithm is adopted to tune the critic weights
where \(\sigma _{1}=\nabla \phi (F+Gu_{o})\), \((\sigma _{1}^{T}\sigma _{1}+1)^{2}\) is used for normalization, \(a_{1}\) is scalar to be designed.
As mentioned in [2, 18], the identification of the critic parameter needs to fulfill the persistent excitation (PE) condition. In order to satisfy this condition, there are numerous options available for the signal selection, as long as the PE condition outlined in [18] is met.
3.3 Actor NN design
According to (3), we know the optimal control could be \(-\frac{1}{2}R^{-1}G^{\textsf {T}}(\nabla \phi ^{\textsf {T}}W+\nabla \epsilon )\). Due the parameter W is unknown, here we utilize an actor NN to approximate the control input. Then, the controller is represented as
where \({\hat{W}}_{a}\) denotes the estimated value of W.
Similarly, \({\hat{W}}_{a}\) should be designed to approach W as closely as possible. Here, the tuning law of the actor NN is
where \({\bar{D}}_{1}=\nabla \phi GR^{-1}G^{\textsf {T}}\nabla \phi ^{\textsf {T}}\), \(m=\frac{\sigma _{2}}{(\sigma _{2}^{\textsf {T}}\sigma _{2}+1)^{2}}\), \(\sigma _{2}=\nabla \phi (F+G{\hat{u}}_{o})\), \(a_{2}\) is scalar to be designed.
The following is the online algorithm that facilitates the simultaneous tuning of the actor NN and the critic NN.
3.4 Stability analysis
The following assumption is necessary for stability analysis in Theorem 2.
Assumption 2
[18] In equation (9), the NN approximate error, NN activation functions and their gradient are bounded on a compact set, i.e., \(\Vert \epsilon \Vert <b_{\epsilon }\), \(\Vert \phi \Vert <b_{\phi }\), \(\Vert \nabla \epsilon \Vert <b_{\epsilon _{x}}\), \(\Vert \nabla \phi \Vert <b_{\phi _{x}}\).
Theorem 2
Given system (8), critic NN updating law (15), actor NN updating law (17), controller u=\(u_{o}\), there exist a positive integer \(N_0\) such that the number of the hidden layer units \(N > N_0\), the closed-loop system states, the critic NN approximate error, and the actor NN approximate error are UUB.
Proof
Choose the Lyapunov function as
where \(e=[e_{i1}, e_{i2}]^{T}\). Taking the derivative, it has
Firstly, we have
Here we define \(\nabla \phi GR^{-1}G^{\textsf {T}}\nabla \phi ^{\textsf {T}}\) as \({\bar{D}}_{1}\), \(\nabla \epsilon ^{\textsf {T}}(F-\frac{1}{2}GR^{-1}G^{\textsf {T}}\nabla \phi ^{\textsf {T}}{\hat{W}}_{a})\) as \(\mu _{1}\) and we have \({\dot{J}}=W_{c}^{\textsf {T}}\sigma _{1}+\frac{1}{2}W_{c}^{\textsf {T}}{\bar{D}}_{1}{\tilde{W}}_{a}+\mu _{1}\). From the HJB Eq. (11), we have \(W^{T}\sigma _{1}=-Q-\frac{1}{4}W^{\textsf {T}}{\bar{D}}_{1}W+\upsilon _{H}\). Then, it has
In addition, we have
Based on the FTDOB, the error becomes 0 after \(T_{d}\) seconds. As \(\dot{{\hat{W}}}_{a}\) is given in Theorem 2, we have
Then \({\dot{V}}\) can be obtained as below by adding (6), (15), (17) and (21)
where \({\bar{\sigma }}=\frac{\sigma _{2}}{\sigma _{2}^{\textsf {T}}\sigma _{2}+1}\), \(m_{s}=\sigma _{2}^{\textsf {T}}\sigma _{2}+1\).
It is obvious that under Assumption 2,
As given in [18], \(\upsilon _{H}\) converges to 0 as the neurons increase. Hence, \(N_{0}\) can be selected such that \(\sup \nolimits _{x\in \Omega }\Vert \upsilon _{H}\Vert <\upsilon\). Assuming \(N>N_{0}\), if we define \({\tilde{Z}}=[Z,\quad {\tilde{W}}_{c},\quad {\tilde{W}}_{a},\quad e]^{\textsf {T}}\), then we have
where \(c=\frac{1}{4}\Vert W\Vert ^{2}\Vert {\bar{D}}_{1}\Vert +\upsilon +\frac{1}{2}\Vert W\Vert b_{\epsilon _{x}}b_{\phi _{x}}b_{g}^{2}\sigma _{\min }(R)\),
Let the parameters be chosen such that \(M>0\). If \(\Vert {\tilde{Z}}\Vert >\sqrt{\frac{p^{2}}{4\sigma _{\min }(M)}+\frac{c+\upsilon }{\sigma _{\min }(M)}}+\frac{\Vert p\Vert }{2\sigma _{\min }(M)}\), then, \({\dot{V}}\) is negative. Thence, the state and the weight error are UUB. \(\square\)
4 Examples
In this section, a linear system is presented firstly to show that the designed update law guarantees the convergence of the weights to their ideal values. Secondly, a nonlinear system example is employed to highlight the effectiveness of the proposed method.
4.1 Linear system example
Consider a linear system, \({\dot{x}}_{1}= -x_{1}-2x_{2}+u\), \({\dot{x}}_{2}= x_{1}-4x_{2}-3u\), where \(x_{1}\) and \(x_{2}\) are system states and u is control input. Choose the cost function as \(J=\int _{0}^{\infty }(x^{T}Qx+u^{T}Ru)\mathrm{{d}}t\), where \(Q=diag(1\quad 1)\) and \(R=1\).
Clearly, the optimal controller based on linear quadratic regulate theory can be easily found. Hence, the ideal NN wights can be also deduced as \(W=\left[ \begin{array}{cccccc} 0.3199&-0.1162&0.1292 \end{array} \right]\). For this system, the NN-based optimal control is implemented as (16) and the NN tuning law are selected as (15) and (17). In the process of NN convergence, in order to ensure PE condition, we add noise signal \(0.5(sin(t)^2*cos(t)+sin(2t)^2*cos(0.1t)+sin(-1.2t)^2*cos(0.5t)+sin(t)^5)\) to the control input here. The reference signal is set as \(r=0\). The simulation results are shown in Fig. 2. The values converge to the optimal values after 50 s, i. e., \({\hat{W}}_{c}=\left[ \begin{array}{cccccc} 0.3199&-0.1162&0.1292 \end{array} \right]\). Also, \({\hat{W}}_{a}=\left[ \begin{array}{cccccc} 0.3199&-0.1162&0.1292 \end{array} \right]\) after 50 s. The optimal controller approximated by NNs is given as
The excitation signal is introduced to satisfy the PE condition, with the result that sufficiently rich data is generated to train the neural network and ensure its convergence. After 80 s, the neural network has converged. After convergence, the exploration signal is removed, and the value of the state of the system remains near 0 after removal.
4.2 Nonlinear system example
Firstly, we consider a reference signal \(r=0\). In this case, the tracking problem is actually a stabilization problem. The exploration signal is chosen as \(200e^{(-0.23t)}*(sin(t)^2*cos(t)+sin(2t)^2*cos(0.1t)+sin(-1.2t)^2*cos(0.5t)+sin(t)^5+sin(1.12t)^2+cos(2.4t)*sin(2.4t)^3)\) and the corresponding results are depicted in Fig. 3.
Then, we set \(r=5\) and exploration signal as \(exp(-0.35t)*200*(sin(t)^2*cos(t)+sin(2t)^2*cos(0.1t)+sin(-1.2t)^2*cos(0.5t)+sin(t)^5+sin(1.12t)^2+cos(2.4t)*sin(2.4t)^3)\). The results are depicted in Fig. 4.
In our simulation, the sampling time is relatively small at 0.001 s. Therefore, it is reasonable to increase the exponential term in the excitation signal. This approach offers several advantages, including reducing the overall training time and minimizing computational resource wastage. However, in practical systems, hardware limitations often prevent maintaining a very small sampling time. In such cases, as highlighted in [2, 3], it becomes crucial to ensure that the excitation signal does not decay too rapidly. This ensures an ample amount of data is available for training the neural network.
5 Conclusion
This paper focused on the design of robust optimal controllers for high-order nonlinear systems in the presence of mismatched disturbances. The proposed approach involves the design of disturbance observers that ensure fixed-time convergence. Subsequently, the original system is transformed into a filtered error nonlinear system. To address the challenges associated with solving Hamilton–Jacobi–Bellman (HJB) equations, the reinforcement learning method has been introduced. Two neural networks have been designed to approximate the cost function and the optimal control, respectively. By integrating these components, a robust optimal controller is finally obtained. The effectiveness of the proposed method has been validated through two illustrative examples.
Data availability
Data sharing not applicable as no new data were generated in this study.
References
Tang L, Gao Y, Liu YJ (2014) Adaptive near optimal neural control for a class of discrete-time chaotic system. Neural Comput Appl 25:1111–1117
Na J, Lv Y, Zhang K, Zhao J (2020) Adaptive identifier-critic-based optimal tracking control for nonlinear systems with experimental validation. IEEE Trans Syst Man Cybern Syst 52(1):459–472
Fan ZX, Li S, Liu R (2022) ADP-based optimal control for dystems with mismatched disturbances: a PMSM application. IEEE Trans Circ Syst II Express Briefs 70(6):2057–2061
Fan ZX, Adhikary AC, Li S, Liu R (2020) Anti-disturbance inverse optimal control for systems with disturbances. Optim Control Appl Methods 44(3):1321–1340
Chen J, Li K, Li K, Yu PS (2021) Dynamic bicycle dispatching of dockless public bicycle-sharing systems using multi-objective reinforcement learning. ACM Trans Cyber-Phys Syst 5(4):1–24
Lewis FL, Vrabie DL, Syrmos VL (2012) Optimal control. Wiley
Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. handbook of intelligent control neural fuzzy and adaptive approaches, 1992
Wei Q, Zhu L, Song R, Zhang P, Liu D, Xiao J (2022) Model-free adaptive optimal control for unknown nonlinear multiplayer nonzero-sum game. IEEE Trans Neural Netw Learn Syst 33(2):879–892
Gao W, Jiang ZP (2016) Adaptive dynamic programming and adaptive optimal output regulation of linear systems. IEEE Trans Autom Control 61(12):4164–4169
Gao W, Jiang ZP, Lewis FL, Wang Y (2018) Leader-to-formation stability of multiagent systems: an adaptive optimal control approach. IEEE Trans Autom Control 63(10):3581–3587
Krstic M, Tsiotras P (1999) Inverse optimal stabilization of a rigid spacecraft. IEEE Trans Autom Control 44(5):1042–1049
Fan ZX, Adhikary AC, Li S, Liu R (2022) Disturbance observer based inverse optimal control for a class of nonlinear systems. Neurocomputing 500:821–831
Ming X, Balakrishnan SN (2005) A new method for suboptimal control of a class of non-linear systems. Optim Control Appl Methods 26(2):55–83
Do TD, Choi HH, Jung WJ (2015) \(\theta\)-D approximation technique for nonlinear optimal speed control design of surface-mounted PMSM drives. IEEE/ASME Trans Mechatron 20(4):1822–1831
Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236
Qin C, Zhang H, Luo Y (2014) Optimal tracking control of a class of nonlinear discrete-time switched systems using adaptive dynamic programming. Neural Comput Appl 24:531–538
Wang D, Liu D, Zhao D, Huang Y, Zhang D (2013) A neural-network-based iterative GDHP approach for solving a class of nonlinear optimal control problems with control constraints. Neural Comput Appl 22(2):219–227
Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888
Yang W, Li K, Li K (2019) A pipeline computing method of SpTV for three-order tensors on CPU and GPU. ACM Trans Knowl Discov Data 13(6):1–27
Zhong K, Yang Z, Xiao G, Li X, Yang W, Li K (2022) An efficient parallel reinforcement learning approach to cross-layer defense mechanism in industrial control systems. IEEE Trans Parallel Distrib Syst 3(11):2979–2990
Liu C, Tang F, Hu Y, Li K, Tang Z, Li K (2021) Distributed task migration optimization in MEC by extending multi-agent deep reinforcement learning approach. IEEE Trans Parallel Distrib Syst 32(7):1603–1614
Jiang Y, Jiang ZP (2012) Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10):2699–2704
Bian T, Jiang Y, Jiang ZP (2014) Adaptive dynamic programming and optimal control of nonlinear nonaffine systems. Automatica 50(10):2624–2632
Wang D (2020) Robust policy learning control of nonlinear plants with case studies for a power system application. IEEE Trans Industr Inf 16(3):1733–1741
Zhao J, Yang C, Gao W, Modares H, Chen X, Dai W (2023) Linear quadratic tracking control of unknown systems: a two-phase reinforcement learning method. Automatica 148:110761
Modares H, Lewis FL (2014) Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7):1780–1792
Chen WH (2004) Disturbance observer based control for nonlinear systems. IEEE/ASME Trans Mechatron 9(4):706–710
Yu B, Du H, Ding L, Wu D, Li H (2022) Neural network-based robust finite-time attitude stabilization for rigid spacecraft under angular velocity constraint. Neural Comput Appl 34:5107–5117
Zhou K, Doyle J, Glover K (1995) Robust and optimal control. Prentice Hall, New Jersey
Utkin V (2003) Variable structure systems with sliding modes. IEEE Trans Autom Control 22(2):212–222
Levant A (2003) Higher-order sliding modes, differentiation and output-feedback control. Int J Control 76(9–10):924–941
Huang J (2004) Nonlinear output regulation- theory and applications. SIAM
Ohishi K, Nakao M, Ohnishi K et al (1987) Microprocessor-controlled DC motor for load-insensitive position servo system. IEEE Trans Industr Electron 34(1):44–49
Han J (2009) From PID to active disturbance rejection control. IEEE Trans Industr Electron 56(3):900–906
Li S, Yang J, Chen WH, Chen X (2014) Disturbance observer-based control: methods and applications. CRC Press, Inc., Boca Raton
Li S, Yang J, Chen WH, Chen X (2012) Generalized extended state observer based control for systems with mismatched uncertainties. IEEE Trans Industr Electron 59(12):4792–4802
Sun H, Guo L (2017) Neural network-based DOBC for a class of nonlinear systems with unmatched disturbances. IEEE Trans Neural Netw Learn Syst 28(2):482–489
Cui B, Zhang L, Xia Y, Zhang J (2022) Continuous distributed fixed-time attitude controller design for multiple spacecraft systems with a directed graph. IEEE Trans Circ Syst II- Express Briefs 69(11):478–4482
Li X, Ma L, Mei K, Ding S, Pan T (2023) Fixed-time adaptive fuzzy SOSM controller design with output constraint. Neural Comput Appl 35(13):9893–9905
Liu W, Chen M, Shi P (2022) Fixed-time disturbance observer-based control for quadcopter suspension transportation system. IEEE Trans Circ Syst I- Regul Pap 69(11):4632–4642
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that there are no conflicts of interest in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fan, ZX., Tang, L., Li, S. et al. Reinforcement learning-based robust optimal tracking control for disturbed nonlinear systems. Neural Comput & Applic 35, 23987–23996 (2023). https://doi.org/10.1007/s00521-023-08993-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08993-0