Abstract
This paper designs two novel event-triggered control (ETC) schemes based on the critic learning technique for constrained discrete-time nonlinear systems. First, starting from the stability of the constrained system, a static ETC method is developed to reduce the computational burden. Then, a nonnegative dynamic variable is introduced into the static event-triggered mechanism, so as to establish the dynamic ETC method, which further improves the resource utilization rate and possesses the anti-interference ability. Moreover, a speedy value iteration architecture is designed to obtain an initially admissible optimal control policy, which can ensure the normal execution of the designed ETC methods. Finally, two experimental examples are provided to illustrate the effectiveness and superiority of the developed schemes.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
With the rapid development of intelligent control [1,2,3,4], adaptive dynamic programming (ADP) [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] is regarded as a promising scheme to accomplish intelligent optimization by introducing the evaluation component. This is mainly because the numerical solutions of Hamilton-Jacobi-Bellman (HJB) equations can be approximately obtained by ADP algorithms. Therefore, ADP algorithms are often used by researchers in related fields to deal with complex nonlinear control problems. In the iteration process, value iteration [9, 10] and policy iteration [11] are two main forms of ADP algorithms. The essence of value iteration is to obtain an approximately optimal control sequence through continuous iteration between policy evaluation and policy improvement. Especially, Al-Tamimi et al. [10] proved the convergence of the value iteration algorithm in theory, which greatly promoted the development of ADP algorithms. Compared with value iteration, the control policy generated by policy iteration possesses stability guarantee and requires an initial stable control law. In order to improve the iteration convergence speed, Ha et al. [12] constructed a new value iteration architecture. So far, a lot of work has been conducted to solve various control problems by using ADP methods, such as trajectory tracking control [14,15,16], robust control [17], networked control [18], event-triggered control (ETC) [19], and constrained control [20]. This fully demonstrates the applicability and great potential of ADP algorithms.
In the control process, we often face the trouble caused by actuator saturation. This may lead to system performance degradation or even loss of stability guarantee. A large number of studies have shown that the control input constrained within a reasonable limit can not only effectively solve the actuator saturation problem but also ensure excellent control performance [21]. In general, the constraint effect can be divided into symmetric constraints [20] and asymmetric constraints [22]. In order to associate the ADP algorithm with the control constraints, a dual ETC scheme with critic learning was developed to control the constrained nonlinear system [19]. For constrained linear systems, a feedback controller was designed to research the global stability problem in [23]. Nevertheless, unlike the constrained linear system, the constrained nonlinear system is more difficult to be solved in the controlled process. Up to now, most of the previous work on constrained nonlinear systems has been focused on application methods rather than the stability of the system, which makes the theoretical support inadequate.
Due to the increasing scale of complex nonlinear systems, the communication burden problem is becoming increasingly serious. Therefore, some control methods that can reduce the computational burden have received extensive attention, such as ETC [24,25,26,27,28,29,30,31,32,33]. The essence of ETC is to determine a satisfactory triggering condition, and the control law is updated only when this triggering condition is contravened, which improves resource utilization. This is more advanced than the time-triggered control that requires updating the control law at every moment. In addition, it is worth noting that the stability of the controlled system needs to be ensured, where the triggering condition is applied. Hence, it is necessary to prove the stability of the system with the ETC scheme when the event is not triggered. This phased updating control mode is particularly suitable for embedded systems and networked control systems [24]. Through in-depth study, ETC has evolved into two kinds: static ETC [28] and dynamic ETC [29]. Dynamic ETC is established by introducing a dynamic variable under the static ETC architecture, which further reduces the computational burden compared with the static ETC. In addition, the triggering condition in the dynamic event-triggered mechanism (ETM) can realize self-adjustment when the interference is encountered. This is something static ETC does not have. However, the dynamic variable that needs to be designed is usually related to the triggering condition in static ETM. This is not easy to do. Thus, up to now, most of the relevant work is to study static ETC. In [30], an ETC scheme was developed to deal with the suboptimal tracking control problem for nonlinear systems. In [28], Wang et al. developed an event based iterative critic learning algorithm, and proved that the controlled system was stable from the perspective of input-to-state stability. With the further study of ETC, relevant researchers have developed a dynamic ETC method [29]. In this control method, dynamic events are monitored and identified, and corresponding control strategies are adopted to maintain the stability and performance of the controlled system. In [31], a dynamic ETC method was designed for discrete-time linear systems. However, the dynamic ETC method for nonlinear systems remains to be studied.
Based on the above background, in this paper, we design two novel static and dynamic ETC schemes under the critic learning architecture for discrete-time nonlinear dynamics with control constraints. It is worth noting that the triggering conditions in these two control schemes are established based on the premise that the constrained controlled system is proved to be uniformly ultimately bounded (UUB). In iterative learning, by introducing an acceleration factor, a new speedy value iteration algorithm is developed to accelerate the iterative convergence rate. In addition, the convergence of the speedy value iteration algorithm is proved. In general, the main contributions are listed as follows.
-
(1)
Starting from the stability of the nonlinear system with control constraints, a novel static ETC scheme under the ADP framework is exploited to address the optimal regulation problem and realize the purpose of improving resource utilization and avoiding actuator saturation. Moreover, the closed-loop system with control constraints under static ETM is proved to be UUB through classified discussion.
-
(2)
We introduce a reasonable dynamic variable into the designed static ETC to build an advanced dynamic ETM. The purpose is to further save communication resources. On the other hand, when there are fluctuations between two consecutive samples, the corresponding dynamic triggering condition can be self-regulating. Meanwhile, according to the theoretical analysis of static ETC, the stability of closed-loop system under dynamic ETC is proved.
-
(3)
In the iteration process, a new speedy value iteration is developed to make the iterative cost function converge faster, and the corresponding convergence is proved. This results can obtain an initially admissible optimal control policy faster than traditional methods. Furthermore, the superiorities of the designed schemes are illustrated by two experimental simulations.
For ease of reading, all abbreviations in the paper are listed in the Table 1.
Notations \(\mathbb {R}\), \(\mathbb {R}^n\), and \(\mathbb {R}^{n\times m}\) surrogate the set of real numbers, the Euclidean space of all n-dimensional real vectors, and the space of all \(n\times m\) dimensional real matrices, respectively. \(\mathbb {N}\) denotes the set of nonnegative integers. \(I_a\) denotes the \(a\times a\) dimensional identity matrix and “\(\textsf{T}\)” is the transpose operation. \(\Omega \subset { \mathbb {R}}^n\) represents a compact set and \(f(\cdot )\in C^n(\Omega )\) represents that the function \(f(\cdot )\) is the continuous nth derivative on \(\Omega \). \(\lambda _{\min }(Q)\) denotes the minimum eigenvalue of the matrix Q.
2 Problem statement
The plant to be studied is described by the following discrete-time nonlinear system:
where \(x_k\in \Omega \subset { \mathbb {R}}^n\) is the state vector, \(u_k\in \Phi _{u}\) is the control vector, \(\Phi _{u}=\{u_k\in { \mathbb {R}}^m, |u_{ik}|\le \bar{U}, i=1,2,\ldots ,m\}\) with the saturation constraint \(\bar{U}>0\). Assume that the system function \(\mathscr {F}(\cdot )\): \({\mathbb {R}}^n\times { \mathbb {R}}^m\rightarrow { \mathbb {R}}^n\) is continuous and differentiable on \(\Omega \subset { \mathbb {R}}^n\). Moreover, we set the corresponding feedback control law as \(u(x_k)\).
In the time-triggered control process, the control law \(u(x_k)\) is updated at each time step k. This continuous updating control method is easier to achieve system stability. However, it is not optimistic in resource utilization. Conversely, in the ETC process, \(u(x_k)\) is generated only when the designed triggering condition is violated. As such, \(u(x_k)\) will be maintained constant in the interval when the event is not triggered by introducing a zero-order hold (ZOH). This means that compared with the time-triggered control, the ETC can greatly reduce the computational burden. Therefore, the stability guarantee of the controlled system is essential under the ETM. For clarity, we define \(\{{k_j}\}^{\infty }_{j=0}\) \((k_0=0)\) as a sequence consisting of event-based sampling instants. It is worth noting that this sequence is monotonically increasing, i.e., \(k_0<k_1\cdots <k_{\infty }\). Then, the event-based control law \(\mu (x_{k_j})\) satisfies
where \(x_k\) and \(x_{k_j}\) represent the current state and sampling state, respectively.
Remark 1
In general, the design of the ETC scheme is likely to cause the Zeno phenomenon. However, this phenomenon mainly occurs in continuous systems, because we cannot guarantee that the next triggering time must be after the previous triggering time. Therefore, for the ETC of continuous systems, we usually give a theoretical proof to avoid the Zeno phenomenon. We consider that \(\{{k_j}\}^{\infty }_{j=0}\) expresses a monotonically increasing sequence of the triggering instant, which means the next triggering time must be after the previous triggering time. Therefore, the Zeno phenomenon will not occur in this paper.
Considering the existence of saturation constraint \(\bar{U}\), it can be clearly concluded that \(\left| \mu _i(x_{k_j})\right| \le \bar{U}\). We introduce a variable \(\sigma _k\) as the triggered interval, which is expressed as
Then, system (1) can be redescribed as
In order to address the optimal control problem of nonlinear systems with infinite horizon effectively, we need to design a feedback control sequence to minimize the cost function, which is to be optimal. That is
where Q is a positive definite matrix and \(Z(\cdot , \cdot )\ge 0\) is the utility function. In order to overcome the actuator saturation problem, inspired by [20], we define the nonquadratic function \(W\big (\mu (x_{k_j})\big )\) as
where \(\psi ^{-1}(\mu (\cdot ))=[\varphi ^{-1}(\mu _{1}(\cdot )), \varphi ^{-1}(\mu _{2}(\cdot )),\) \( \dotsc , \varphi ^{-1}(\mu _{m}(\cdot ))]^{\textsf{T}}\) and \(\psi (\cdot )\in { \mathbb {R}}^m\), \(R=\text {diag}\) \(\{r_1, r_2, \dotsc , r_m\}\) is a positive definite matrix, \(i=1,2,\dotsc ,m\). In addition, it is worth noting that \(\varphi (\cdot )\) is a strictly monotonically increasing odd function and bounded, \(|\varphi (\cdot )| \le 1\) and it also belongs to \(C^{b}(b \ge 1)\) and \(L_{2}(\Omega )\). Then, we can determine that the function \(W\big (\mu (x_{k_j})\big )\) is positive definite. For simplicity, according to the characteristics of \(\varphi (\cdot )\), we choose \(\varphi (\cdot )=\text {tanh}(\cdot )\). Without loss of generality, we assume that the eigenvalues of the matrix R are the same number, i.e., \(r_1=r_2=\cdots =r_m=r>0\). Hence, the function \(W\big (\mu (x_{k_j})\big )\) can be rewritten as
Then, the following partial derivative can be easily obtained:
According to the optimality principle, we can obtain the optimal control law \(\mu ^*(x_{k_j})\) by solving the following equation:
Hence, \(\mu ^*(x_{k_j})\) is solved and expressed as
Observing (5) and (10), we find that \(\mathscr {J}^*(x_k)\) and \(\mu ^*(x_{k_j})\) can be calculated concretely if the value of \(\mathscr {J}^*(x_{k+1})\) is known. However, in fact, it is difficult for nonlinear systems. In addition, in order to effectively improve resource utilization and have favourable control performance, an event-based adaptive critic near-optimal control algorithm is developed.
3 Static/dynamic ETC design
This section consists of two subsections. First subsection, a novel static triggering condition is designed for discrete-time nonlinear systems with control constraints. This triggering condition can ensure the stability of the controlled system when the control constraint is considered. The stability is uncommon for discrete-time nonlinear systems with control constraints. Therefore, it makes sense for this static triggering condition to be developed. Second subsection, we introduce a dynamic variable based on the static triggering condition, and then design a dynamic ETC method.
3.1 Novel static ETC
The main purpose of designing a reasonable event-triggering condition is to determine the sampling instant \(k_{j+1}\), \(j\in {\mathbb {N}}\). In the static ETM, the selection of sampling instants is related to the current state \(x_k\) and the event-triggered interval \(\sigma _k\). Overall, the simple frame of the static event-triggered control is displayed in Fig. 1. More specifically, the static ETC scheme is applied to deal with the optimal regulation problem of the system (4). The control law \(\mu (x_{k_j})\) is updated at \(k_{j+1}\) that can be determined as follows:
where \(\mathscr {C}(x_k, \sigma _k)\) is the static triggering threshold we need to design next. Before proceeding, we introduce a common lemma and an useful assumption with the same property as it used in [32, 34].
Lemma 1
For arbitrary vectors \(\mathscr {A}\) and \(\mathscr {B}\), and a positive constant \(\varrho \), the inequality
is always true.
Assumption 1
Assume that the control law \(u^*(x_k)\) is Lipschitz continuous for all \(x_k\in \Omega \). That is, there exists a Lipschitz constant \(K_u>0\) such that
Theorem 1
Let \(\mathscr {J}^*(x_k)\) be the solution of the HJB equation (5) while Assumption 1 holds. If the static triggering threshold \(\mathscr {C}(x_k, \sigma _k)\) satisfies
where \(0<\beta <1\) is an adjustable parameter. Then, we can declare that the closed-loop system (4) to be stable in the sense of UUB under the event-based optimal control law \(\mu ^*(x_{k_j})\). According to (11), the static triggering condition can be reexpressed as
Proof
Observing (5), the first-order difference of the optimal cost function \(\mathscr {J}^*(x_k)\) satisfies
According to (7), one has
Let \(s_i=t_i/\bar{U}\), \(i=1,2,\ldots ,m\). Then, according to variable substitution methods, one has
We already know that \(\left| \mu _i(x_{k_j})\right| <\bar{U}\) and the expression of the inverse hyperbolic tangent function is \(\text {tanh}^{-1}(X)=\dfrac{1}{2}\text {ln}\big ((1+X)/(1-X)\big )\). Therefore, by simplifying (18), we can obtain
Based on the range of \(\mu ^*_i(x_{k_j})\), \(i=1,2,\ldots ,m\), for the control law \(\mu ^*_i(x_{k_j})\) of the same dimension, we have
which implies
for every \(\mu ^*_i(x_{k_j})\). In addition, if \(\mu ^*_i(x_{k_j})<0\), we can easily know that \(\text {ln}\big (1+\mu _i^*(x_{k_j})/\bar{U}\big )<0\). On the contrary, if \(\mu ^*_i(x_{k_j})\ge 0\), we have \(\text {ln}\big (1+\mu _i^*(x_{k_j})/\bar{U}\big )\ge 0\). Then, in order to further analyze the stability of the closed-loop system (4), we will discuss the following three cases.
Case 1: In this case, we assume that \(\kappa \) elements are less than 0 in the event-based optimal control law \(\mu ^*(x_{k_j})\), where \(\kappa \) is a positive integer and satisfies \(1\le \kappa < m\). This indicates that there are \(\tau =m-\kappa \) elements that are not less than 0. For clarity, we design a set \(\{h_1, h_2,\ldots ,h_{\kappa }\}\) to express the corresponding \(\kappa \) elements and design a set \(\{b_1, b_2,\ldots ,b_{\tau }\}\) to express the other \(\tau \) elements. According to (19)–(21), one has
where \(p\in \{1,2,\ldots ,\kappa \}\) and \(q\in \{1,2,\ldots ,\tau \}\). In addition, after the control input is constrained, the element \(|\mu _i^*(x_{k_j})|\) cannot tend to \(\bar{U}\) for all i. This means that \(-\text {ln}\big (1-|\mu _i^*(x_{k_j})|/\bar{U}\big )\) has an upper bound. Then, we assume that there exists a boundary \(\delta _M>0\) such that \(-\text {ln}\big (1-|\mu _i^*(x_{k_j})|/\bar{U}\big )\le \delta _M\) for any i. By further derivation, one has
Case 2: In this case, we assume that \(\mu _i(x_{k_j})<0\) for all i. Then, the equation (19) can be evolved as
Case 3: In this case, we assume that \(\mu _i(x_{k_j})\ge 0\) for all i. Similarly, we have
Combining the above three cases, we know that \(-W\big (\mu ^*\) \((x_{k_j})\big )\le 2r\delta _M\bar{U}^2 \sum _{i=1}^{m}\Big \{1-\dfrac{\left| \mu _i^*(x_{k_j})\right| }{2\bar{U}}\Big \}\). Then, according to Lemma 1, we have
In addition, we can easily get
Substituting (27) into (26), one has
which yields
By splitting, one has
By applying Assumption 1 and Lemma 1, we have
Substituting (31) into (29), one has
By combining (15) and (32), we can obtain
where
Due to Q is a positive definite matrix, which means that \(\lambda _{\min }(Q)>0\). Thus, \(\Delta \mathscr {J}^*(x_k)<0\) holds only if the system state \(x_k\) satisfies
This verifies that the event-based closed-loop system (4) is stable in the sense of UUB with the optimal control law \(\mu ^*(x_{k_j})\) in (10). This completes the proof. \(\square \)
3.2 Evolved dynamic ETC
Different from the static ETM, the dynamic ETM needs to introduce a dynamic variable \(\zeta _k\). Overall, the simple frame of the dynamic ETC is expressed in Fig. 2. Then, the sampling instant \(k_{j+1}\) can be determined by
where \(\mathscr {C}(x_k, \sigma _k, \zeta _k)\) is the dynamic triggering threshold. Similar to the static ETC method, the event is triggered only when the corresponding triggering threshold is greater than zero. Inspired by [29], \(\mathscr {C}(x_k, \sigma _k, \zeta _k)\) is defined as
and the auxiliary dynamic variable \(\zeta _k\) is specifically defined as
with the variable \(\zeta _{k_j}\ge 0\) during the event is not triggered, i.e., \(k\in [k_j, k_{j+1})\), where \(\gamma \in (0,1)\) and \(\vartheta \ge 1/\gamma \) are two positive scalars. This means that when the event is not triggered, the dynamic variable \(\zeta _k\) is updated through equation (38), and when the triggering condition is violated, the dynamic variable is reassigned.
Lemma 2
Let \(\mathscr {C}(x_k, \sigma _k, \zeta _k)\) and \(\zeta _k\) satisfy (37) and (38), respectively. Then, the dynamic variable \(\zeta _k\) satisfies
for all \(k\in {\mathbb {N}}\).
Proof
According to the dynamic triggering condition designed above, during \(k\in [k_j, k_{j+1})\), we have
which implies
Combining (38) and (41) yields
which leads to
This completes the proof. \(\square \)
It follows from Lemma 2 that the dynamic variable \(\zeta _k\) is nonnegative for all \(k\in {\mathbb {N}}\). Thus, the system state is sampled only when \(\mathscr {C}(x_k, \sigma _k)>\dfrac{1}{\vartheta }\zeta _k\) holds, which is more strict than \(\mathscr {C}(x_k, \sigma _k)>0\) in the static ETM. This means that when the static triggering condition is violated, i.e., \(\mathscr {C}(x_k, \sigma _k)>0\), the dynamic triggering condition is not necessarily violated. Then, we can infer that the number of events released in the dynamic ETM is less than that in the static ETM.
Theorem 2
Assume that the dynamic variable \(\zeta _k\) has an upper bound, i.e., \(\zeta _k\le \zeta _M\) for all k. Then, according to the event-based optimal control law \(\mu ^{*}(x_{k_j})\) from the closed-loop system (4) with the dynamic triggering condition \(\mathscr {C}(x_k,\) \(\sigma _k, \zeta _k)\le 0\), we can deduce that the controlled system (4) is UUB.
Proof
According to the theoretical proof in the static ETM, then (33) can be further expressed as
According to the value range of \(\vartheta \) and \(\gamma \), we have
Combining (44) and (45), one has
where
Similar to the static ETM, \(\Delta \mathscr {J}^*(x_k)<0\) holds only if the system state \(x_k\) satisfies
This completes the proof. \(\square \)
Remark 2
It can be seen from Theorems 1 and 2 that if the controlled system (4) is stable, the condition (48) under the dynamic ETM is more strict than the condition (35) under the static ETM. This is mainly because the introduction of the dynamic variable \(\zeta _k\) in the dynamic ETM further expands the triggered interval and can also prevent interference. In addition, the static and dynamic triggering conditions developed in this paper are not unique, which will change with the change of the adjustable parameter \(\beta \). On the premise that the controlled system is stable, the triggered interval increases with the increase of \(\beta \). However, if the choice of \(\beta \) is too large, the static/dynamic triggering condition will be difficult to violate and the control law will not be updated for a long time, thus affecting the stability of the system. Therefore, the selection of the adjustable parameter \(\beta \) is very important. To make it easier to implement the developed algorithm, the selection value of \(\beta \) is small during simulation verification in Section V.
4 Algorithm implementation
The introduction of ETM affects the controlled performance of the system to a certain extent. Therefore, we adopt an integrated idea: (1) a speedy value iteration algorithm under the time-triggered mechanism is used to obtain an acceptable approximate optimal control policy \(u^*(x_k)\). In this algorithm, an acceleration factor \(\alpha \) is introduced to greatly reduce the number of iterations compared with the traditional value iteration method [10]. Then, we treat the obtained \(u^*(x_k)\) as an initial admissible control policy for the controlled system, which aims is to ensure the normal operation of the ETC algorithm. (2) In the time steps, we take the obtained admissible control policy as the starting point, and then build the static/dynamic ETC scheme to reduce the update times of the control law. Therefore, the algorithm we developed possesses high efficiency in terms of the iteration steps and the time steps.
In order to facilitate the research of the speedy value iteration algorithm, the optimal cost function \(\mathscr {V}^*(x_k)\) under the time-triggered mechanism is defined as
Then, the corresponding optimal control policy \(u^*(x_k)\) satisfies
4.1 Traditional value iteration
Before analyzing the traditional value iteration algorithm with constrained control, we set a parameter \(l\in {\mathbb {N}}\) as the iteration index. Particularly, when \(l=0\), the initial cost function \(\breve{\mathscr {V}}^{(0)}(\cdot )\) is not less than 0 [9]. Hence, the entire traditional value iteration scheme is carried out between the policy improvement
and the cost function
Observing (51) and (52), we find that \(\breve{u}^{(l)}(0)=0\) and \(\breve{\mathscr {V}}^{(l+1)}(0)\) \(=0\) for any l. By using the similar convergence analysis process as [12], we can easily deduce that when l tends to infinity, \(\breve{\mathscr {V}}^{(l)}(x_k)=\mathscr {V}^*(x_k)\) and \(\breve{u}^{(l)}(x_k)=u^*(x_k)\).
4.2 Speedy value iteration
In order to achieve faster convergence of the iterative cost function, we design a parameter \(\alpha \ge 1\) as the acceleration factor. Then, inspired by [12], the speedy value iteration scheme with constrained control is performed in Algorithm 1. It is worth noting that the entire iterative process is carried out under the time-triggered mechanism.
Remark 3
Observing the above two iteration schemes, we can see that when \(\alpha =1\), the speedy value iteration scheme is equivalent to the traditional value iteration scheme. Therefore, by appropriately increasing the acceleration factor \(\alpha \), the number of iteration steps is greatly reduced when the iterative cost function reaches convergence. However, \(\alpha \) should also not be chosen too large, so that the iterative cost function can converge to the optimal value and not be diverge.
In the following, the convergence of the iterative cost function sequence \(\{\tilde{\mathscr {V}}^{(l)}(x_k)\}\) is analyzed through a theorem.
Theorem 3
Let the iterative control function \(\tilde{u}^{(l)}(x_k)\) and the iterative cost function \(\tilde{\mathscr {V}}^{(l)}(x_k)\) be obtained by (53) and (54), respectively. Then, suppose there exist scalars \(\eta \), \(\xi _1\), and \(\xi _2\) such that \(0\le \mathscr {V}^{*}(x_{k+1})\le \eta \big (x_k^{\textsf{T}}Qx_k+W(u(x_k))\big )\) and \(0\le \xi _1\mathscr {V}^{*}(x_k)\le \tilde{\mathscr {V}}^{(0)}(x_k)\) \(\le \xi _2\mathscr {V}^{*}(x_k)\), where \(0< \eta < \infty \) and \(0\le \xi _1\le 1< \xi _2<\infty \). If the acceleration factor \(\alpha \) satisfies
where \(0<\mathscr {T}<1\) is a positive scalar and \(\mathscr {L}_{\min }=\min \{1-\xi _1, \xi _2-1\}\), then the iterative cost function \(\tilde{\mathscr {V}}^{(l)}(x_k)\) can approximate the optimal cost function \(\mathscr {V}^{*}(x_k)\) by
Proof
According to (56), one has
which implies
Next, according to the mathematical induction, the left-hand side of (56) can be proved. Letting \(l=1\), we have
Substituting (58) into (59) leads to
Same idea as [12], by recursing \(l-1\) times, the iterative cost function \(\tilde{\mathscr {V}}^{(l)}(x_k)\) satisfies
Similarly, the right half of (56) can be proved by the same method. In particular, when l tends to infinity, one has
which leads to \(\tilde{\mathscr {V}}^{(\infty )}(x_k)=\mathscr {V}^{*}(x_k)\). This completes the proof. \(\square \)
According to the speedy value iteration algorithm, the optimal cost function \(\mathscr {V}^{*}(x_k)\) and the corresponding optimal control law \(u^*(x_k)\) can be easily obtained with fewer iteration steps. In order to further reduce the computational burden in time steps, the static/dynamic ETC method is introduced and performed in Algorithm 2.
5 Simulation studies
In order to shore up the previous theoretical analysis and further demonstrate the progressiveness of the proposed algorithm, two experimental examples are supplied in this section.
5.1 Example 1
Consider the following inverted pendulum plant:
where the system parameters are provided in Table 2. We set the sampling interval \(\Delta t=0.1\text {s}\), and then the inverted pendulum plant can be discretized into
where the system state \(x_k=[x_{1k},x_{2k}]^{\textsf{T}}=[\pi _k,\varpi _k]^{\textsf{T}}\) and \(x_0=[-1,1]^{\textsf{T}}\). Considering that the system model is unknown, inspired by [19, 28], we build a model network to identify the system dynamics. In addition, some important control parameters are listed in Table 3. In this paper, we can get an acceptable control policy through the offline iterative method, which aims to ensure the controlled performance of the system. Then, in order to verify the influence of the acceleration factor on the iterative convergence speed, we choose three different acceleration factors for the experiment, that is, \(\alpha =1\), \(\alpha =1.5\), and \(\alpha =2\). The corresponding evolution curves of the iterative cost function are displayed in Fig. 3. When \(\alpha =1\), the developed speedy iterative algorithm is equivalent to the traditional method. Hence, it can be easily seen that with the increase of acceleration factor, the convergence speed of the iterative cost function also increases.
According to the parameter values given in Table 3, the static triggering condition can be specifically constructed as
Similarly, the dynamic triggering condition can be specifically constructed as
Note that the control law can be updated only when the corresponding triggering condition is violated. The dynamic variable \(\zeta _k\) is nonnegative, which means that the dynamic triggering condition is more difficult to be violated than the static triggering condition. Then, the state responses under three control schemes are shown in Fig. 4, which implies that all system states can converge to zero preeminently. The control curves under the static ETM and the dynamic ETM are shown in Figs. 5 and 6, respectively. It can be observed that the control curves under these two mechanisms are ladder shaped and the triggered interval under the dynamic ETM is larger than that under the static ETM. In addition, the control inputs are constrained within \([-2.5,2.5]\). Comparing the designed control schemes with the traditional time-triggered control method, the traditional method can not even ensure the stability of the controlled system under the same control parameters. Then, the corresponding sampling numbers are given in Fig. 7. The control input under the static ETM is updated 34 times in 100 time steps. However, it has only been updated 14 times under the dynamic ETM.
5.2 Example 2
Consider the following third-order nonlinear dynamics:
where the system state \(x_k=[x_{1k},x_{2k},x_{3k}]^{\textsf{T}}\) with \(x_0=[0.5,0.5,\) \(0.5]^{\textsf{T}}\), the control variable \(u_k=[u_{1k},u_{2k}]^{\textsf{T}}\). Similar to Example 1, some important parameter values of this system are listed in Table 3. We also choose three acceleration factors, that is, \(\alpha =1\), \(\alpha =1.2\), and \(\alpha =1.5\), which aims to verify the effectiveness of the speedy iteration algorithm. The corresponding evolution curves of the iterative cost function are shown in Fig. 8.
According to the parameter values given in Table 3, the static and dynamic triggering conditions can be constructed as
and
respectively. Then, the state responses under three control schemes are displayed in Fig. 9. Compared with the traditional time-triggered control scheme, the ETC scheme designed by us also possesses a positive convergence effect. The control curves under the static ETM and the dynamic ETM are given in Figs. 10 and 11, respectively. The control inputs under these two control schemes are constrained within \([-1,1]\). Then, the corresponding sampling numbers are shown in Fig. 12. The control inputs under the static ETM is updated 224 times in 500 time steps. In addition, under the dynamic ETM, the control inputs is updated 199 times in 500 time steps. All the experimental results verify the excellent performance of the proposed control methods.
6 Conclusion
In this paper, in order to address the optimal control problem of discrete-time nonlinear dynamics with control constraints and improve the resource utilization rate effectively, we develop two control schemes: static ETC and dynamic ETC. First, a satisfying static triggering condition is designed from the perspective of stability. Then, on this basis, a dynamic variable is introduced to design a dynamic triggering condition. Note that control laws under different control mechanisms are updated only when the corresponding triggering condition is violated. Moreover, under iterative learning, an acceleration factor is introduced to make the iterative convergence speed faster. Finally, the effectiveness and superiority of the developed schemes are illustrated by two experimental examples. The experimental results show that the reduction in computation load varies with different controlled systems and tuning parameters when the same control method is applied, and the dynamic ETC method can further enhance resource utilization compared to the static ETC method. However, the disadvantage of the two ETC methods designed in this paper is that the corresponding triggering conditions need to be judged continuously. Therefore, in the future work, we will study the self-triggered control methods of nonlinear systems.
Data availability
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. Mag. 32(6), 76–105 (2012)
You, L., Jiang, X., Li, B., Zhang, X., Yan, H.: Impulsive layered control of heterogeneous multi-agent systems under limited communication. IEEE Trans. Ind. Inf. 20(3), 5014–5021 (2024)
Rao, J., Wang, J., Xu, J., Zhao, S.: Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces. Nonlinear Dyn. 111, 20041–20053 (2023)
You, L., Jiang, X., Zheng, S., Yan, H.: Communication limited hybrid impulsive control of fuzzy time-delay multiagent network. IEEE Trans. Fuzzy Syst. 32(1), 152–159 (2024)
Huo, Y., Wang, D., Qiao, J., Li, M.: Adaptive critic design for nonlinear multi-player zero-sum games with unknown dynamics and control constraints. Nonlinear Dyn. 111, 11671–11683 (2023)
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)
Zhao, M., Wang, D., Qiao, J., Ha, M., Ren, J.: Advanced value iteration for discrete-time intelligent critic control: a survey. Artif. Intell. Rev. 56, 12315–12346 (2023)
Wang, D., Wang, J., Zhao, M., Xin, P., Qiao, J.: Adaptive multi-step evaluation design with stability guarantee for discrete-time optimal learning control. IEEE/CAA J. Autom. Sin. 10(9), 1797–1809 (2023)
Zhao, M., Wang, D., Ha, M., Qiao, J.: Evolving and incremental value iteration schemes for nonlinear discrete-time zero-sum games. IEEE Trans. Cybern. 53(7), 4487–4499 (2023)
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. B Cybern. 38(4), 943–949 (2008)
Liu, D., Wei, Q.: Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans. Syst. Man Cybern. Syst. 45(12), 1577–1591 (2015)
Ha, M., Wang, D., Liu, D.: A novel value iteration scheme with adjustable convergence rate. IEEE Trans. Neural Netw. Learn. Syst. 34(10), 7430–7442 (2023)
Wang, D., Gao, N., Liu, D., Li, J., Lewis, F.L.: Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications. IEEE/CAA J. Autom. Sin. 11(1), 18–36 (2024)
Li, C., Ding, J., Lewis, F.L., Chai, T.: A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems. Automatica 129(109687), 1–9 (2021)
Ha, M., Wang, D., Liu, D.: Discounted iterative adaptive critic designs with novel stability analysis for tracking control. IEEE/CAA J. Autom. Sin. 9(7), 1262–1272 (2022)
Wang, D., Zhao, M., Ha, M., Qiao, J.: Intelligent optimal tracking with application verifications via discounted generalized value iteration. Acta Autom. Sin. 48(1), 182–193 (2022)
Yang, Y., Gao, W., Modares, H., Xu, C.Z.: Robust actor-critic learning for continuous-time nonlinear systems with unmodeled dynamics. IEEE Trans. Fuzzy Syst. 30(6), 2101–2112 (2022)
Xu, H., Jagannathan, S., Lewis, F.L.: Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses. Automatica 48(6), 1017–1030 (2012)
Wang, D., Hu, L., Zhao, M., Qiao, J.: Dual event-triggered constrained control through adaptive critic for discrete-time zero-sum games. IEEE Trans. Syst. Man Cybern. Syst. 53(3), 1584–1595 (2023)
Liu, D., Yang, X., Wang, D., Wei, Q.: Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Trans. Cybern. 45(7), 1372–1385 (2015)
Sussmann, H.J., Sontag, E.D., Yang, Y.: A general result on the stabilization of linear systems using bounded controls. IEEE Trans. Autom. Control 39(12), 2411–2425 (1994)
Yang, X., Wei, Q.: Adaptive critic learning for constrained optimal event-triggered control with discounted cost. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 91–104 (2021)
Yang, X., Zhou, B., Mazenc, F., Lam, J.: Global stabilization of discrete-time linear systems subject to input saturation and time delay. IEEE Trans. Autom. Control 66(3), 1345–1352 (2021)
Postoyan, R., Tabuada, P., Nesic, D., Anta, A.: A ramework for the event-triggered stabilization of nonlinear systems. IEEE Trans. Autom. Control 60(4), 982–996 (2015)
Tallapragada, P., Chopra, N.: On event triggered tracking for nonlinear systems. IEEE Trans. Autom. Control 58(9), 2343–2348 (2013)
Vamvoudakis, K.G., Mojoodi, A., Ferraz, H.: Event-triggered optimal tracking control of nonlinear systems. Int. J. Robust Nonlinear Control 14(4), 598–619 (2017)
Wang, D., Zhou, Z., Liu, A., Qiao, J.: Event-triggered robust adaptive critic control for nonlinear disturbed systems. Nonlinear Dyn. 111, 19963–19977 (2023)
Wang, D., Ha, M., Qiao, J.: Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation. IEEE Trans. Autom. Control 65(3), 1272–1279 (2020)
Ge, X., Han, Q., Wang, Z.: A dynamic event-triggered transmission scheme for distributed set-membership estimation over wireless sensor networks. IEEE Trans. Cybern. 49(1), 171–183 (2019)
Batmani, Y., Davoodi, M., Meskin, N.: Event-triggered suboptimal tracking controller design for a class of nonlinear discrete-time systems. IEEE Trans. Ind. Electron. 64(10), 8079–8087 (2017)
Zhang, K., Zhou, B., Zheng, W., Duan, G.: Event-triggered and self-triggered gain scheduled control of linear systems with input constraints. IEEE Trans. Syst. Man Cybern. Syst. 52(10), 6452–6463 (2022)
Vamvoudakis, K.G.: Event-triggered optimal adaptive control algorithm for continuous-time nonlinear systems. IEEE/CAA J. Autom. Sin. 1(3), 282–293 (2014)
Heemels, W.P.M.H., Donkers, M.C.F., Teel, A.R.: Periodic event-triggered control for linear systems. IEEE Trans. Autom. Control 58(4), 847–861 (2013)
Zhu, Y., Zhao, D., He, H., Ji, J.: Event-triggered optimal control for partially unknown constrained-input systems via adaptive dynamic programming. IEEE Trans. Ind. Electron. 64(5), 4101–4109 (2017)
Funding
This work was supported in part by the National Natural Science Foundation of China under Grants 62222301, 61890930-5, and 62021003; in part by the National Science and Technology Major Project under Grants 2021ZD0112302 and 2021ZD0112301; and in part by the Beijing Natural Science Foundation under Grant JQ19013.
Author information
Authors and Affiliations
Contributions
All authors’ individual contributions. LH: Formal analysis; Validation; Writing—original draft. DW: Investigation; Supervision; Writing—review and editing. JQ: Methodology; Supervision.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, L., Wang, D. & Qiao, J. Static/dynamic event-triggered learning control for constrained nonlinear systems. Nonlinear Dyn 112, 14159–14174 (2024). https://doi.org/10.1007/s11071-024-09778-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11071-024-09778-3