Static/dynamic event-triggered learning control for constrained nonlinear systems

Hu, Lingzhi; Wang, Ding; Qiao, Junfei

doi:10.1007/s11071-024-09778-3

Static/dynamic event-triggered learning control for constrained nonlinear systems

Research
Published: 12 June 2024

Volume 112, pages 14159–14174, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Nonlinear Dynamics Aims and scope Submit manuscript

Static/dynamic event-triggered learning control for constrained nonlinear systems

Download PDF

Lingzhi Hu^1,2,3,4,
Ding Wang^1,2,3,4 &
Junfei Qiao^1,2,3,4

172 Accesses
Explore all metrics

Abstract

This paper designs two novel event-triggered control (ETC) schemes based on the critic learning technique for constrained discrete-time nonlinear systems. First, starting from the stability of the constrained system, a static ETC method is developed to reduce the computational burden. Then, a nonnegative dynamic variable is introduced into the static event-triggered mechanism, so as to establish the dynamic ETC method, which further improves the resource utilization rate and possesses the anti-interference ability. Moreover, a speedy value iteration architecture is designed to obtain an initially admissible optimal control policy, which can ensure the normal execution of the designed ETC methods. Finally, two experimental examples are provided to illustrate the effectiveness and superiority of the developed schemes.

Dynamic Event-triggered Approximate Optimal Control Strategy for Nonlinear Systems

Article 21 April 2022

Adaptive Event Triggered Optimal Control for Constrained Continuous-time Nonlinear Systems

Article 11 March 2022

Event-Triggered Adaptive Dynamic Programming for Uncertain Nonlinear Systems

Discover the latest articles, news and stories from top researchers in related subjects.

Automotive Engineering

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the rapid development of intelligent control [1,2,3,4], adaptive dynamic programming (ADP) [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] is regarded as a promising scheme to accomplish intelligent optimization by introducing the evaluation component. This is mainly because the numerical solutions of Hamilton-Jacobi-Bellman (HJB) equations can be approximately obtained by ADP algorithms. Therefore, ADP algorithms are often used by researchers in related fields to deal with complex nonlinear control problems. In the iteration process, value iteration [9, 10] and policy iteration [11] are two main forms of ADP algorithms. The essence of value iteration is to obtain an approximately optimal control sequence through continuous iteration between policy evaluation and policy improvement. Especially, Al-Tamimi et al. [10] proved the convergence of the value iteration algorithm in theory, which greatly promoted the development of ADP algorithms. Compared with value iteration, the control policy generated by policy iteration possesses stability guarantee and requires an initial stable control law. In order to improve the iteration convergence speed, Ha et al. [12] constructed a new value iteration architecture. So far, a lot of work has been conducted to solve various control problems by using ADP methods, such as trajectory tracking control [14,15,16], robust control [17], networked control [18], event-triggered control (ETC) [19], and constrained control [20]. This fully demonstrates the applicability and great potential of ADP algorithms.

In the control process, we often face the trouble caused by actuator saturation. This may lead to system performance degradation or even loss of stability guarantee. A large number of studies have shown that the control input constrained within a reasonable limit can not only effectively solve the actuator saturation problem but also ensure excellent control performance [21]. In general, the constraint effect can be divided into symmetric constraints [20] and asymmetric constraints [22]. In order to associate the ADP algorithm with the control constraints, a dual ETC scheme with critic learning was developed to control the constrained nonlinear system [19]. For constrained linear systems, a feedback controller was designed to research the global stability problem in [23]. Nevertheless, unlike the constrained linear system, the constrained nonlinear system is more difficult to be solved in the controlled process. Up to now, most of the previous work on constrained nonlinear systems has been focused on application methods rather than the stability of the system, which makes the theoretical support inadequate.

Due to the increasing scale of complex nonlinear systems, the communication burden problem is becoming increasingly serious. Therefore, some control methods that can reduce the computational burden have received extensive attention, such as ETC [24,25,26,27,28,29,30,31,32,33]. The essence of ETC is to determine a satisfactory triggering condition, and the control law is updated only when this triggering condition is contravened, which improves resource utilization. This is more advanced than the time-triggered control that requires updating the control law at every moment. In addition, it is worth noting that the stability of the controlled system needs to be ensured, where the triggering condition is applied. Hence, it is necessary to prove the stability of the system with the ETC scheme when the event is not triggered. This phased updating control mode is particularly suitable for embedded systems and networked control systems [24]. Through in-depth study, ETC has evolved into two kinds: static ETC [28] and dynamic ETC [29]. Dynamic ETC is established by introducing a dynamic variable under the static ETC architecture, which further reduces the computational burden compared with the static ETC. In addition, the triggering condition in the dynamic event-triggered mechanism (ETM) can realize self-adjustment when the interference is encountered. This is something static ETC does not have. However, the dynamic variable that needs to be designed is usually related to the triggering condition in static ETM. This is not easy to do. Thus, up to now, most of the relevant work is to study static ETC. In [30], an ETC scheme was developed to deal with the suboptimal tracking control problem for nonlinear systems. In [28], Wang et al. developed an event based iterative critic learning algorithm, and proved that the controlled system was stable from the perspective of input-to-state stability. With the further study of ETC, relevant researchers have developed a dynamic ETC method [29]. In this control method, dynamic events are monitored and identified, and corresponding control strategies are adopted to maintain the stability and performance of the controlled system. In [31], a dynamic ETC method was designed for discrete-time linear systems. However, the dynamic ETC method for nonlinear systems remains to be studied.

Based on the above background, in this paper, we design two novel static and dynamic ETC schemes under the critic learning architecture for discrete-time nonlinear dynamics with control constraints. It is worth noting that the triggering conditions in these two control schemes are established based on the premise that the constrained controlled system is proved to be uniformly ultimately bounded (UUB). In iterative learning, by introducing an acceleration factor, a new speedy value iteration algorithm is developed to accelerate the iterative convergence rate. In addition, the convergence of the speedy value iteration algorithm is proved. In general, the main contributions are listed as follows.

(1)
Starting from the stability of the nonlinear system with control constraints, a novel static ETC scheme under the ADP framework is exploited to address the optimal regulation problem and realize the purpose of improving resource utilization and avoiding actuator saturation. Moreover, the closed-loop system with control constraints under static ETM is proved to be UUB through classified discussion.
(2)
We introduce a reasonable dynamic variable into the designed static ETC to build an advanced dynamic ETM. The purpose is to further save communication resources. On the other hand, when there are fluctuations between two consecutive samples, the corresponding dynamic triggering condition can be self-regulating. Meanwhile, according to the theoretical analysis of static ETC, the stability of closed-loop system under dynamic ETC is proved.
(3)
In the iteration process, a new speedy value iteration is developed to make the iterative cost function converge faster, and the corresponding convergence is proved. This results can obtain an initially admissible optimal control policy faster than traditional methods. Furthermore, the superiorities of the designed schemes are illustrated by two experimental simulations.

For ease of reading, all abbreviations in the paper are listed in the Table 1.

Table 1 Full name and corresponding abbreviation

Full size table

Notations $\mathbb {R}$, $\mathbb {R}^n$, and $\mathbb {R}^{n\times m}$ surrogate the set of real numbers, the Euclidean space of all n-dimensional real vectors, and the space of all $n\times m$ dimensional real matrices, respectively. $\mathbb {N}$ denotes the set of nonnegative integers. $I_a$ denotes the $a\times a$ dimensional identity matrix and “$\textsf{T}$” is the transpose operation. $\Omega \subset { \mathbb {R}}^n$ represents a compact set and $f(\cdot )\in C^n(\Omega )$ represents that the function $f(\cdot )$ is the continuous nth derivative on $\Omega $. $\lambda _{\min }(Q)$ denotes the minimum eigenvalue of the matrix Q.

2 Problem statement

The plant to be studied is described by the following discrete-time nonlinear system:

$$\begin{aligned} x_{k+1}=\mathscr {F}(x_k, u_k), k\in {\mathbb {N}}, \end{aligned}$$

(1)

where $x_k\in \Omega \subset { \mathbb {R}}^n$ is the state vector, $u_k\in \Phi _{u}$ is the control vector, $\Phi _{u}=\{u_k\in { \mathbb {R}}^m, |u_{ik}|\le \bar{U}, i=1,2,\ldots ,m\}$ with the saturation constraint $\bar{U}>0$. Assume that the system function $\mathscr {F}(\cdot )$: ${\mathbb {R}}^n\times { \mathbb {R}}^m\rightarrow { \mathbb {R}}^n$ is continuous and differentiable on $\Omega \subset { \mathbb {R}}^n$. Moreover, we set the corresponding feedback control law as $u(x_k)$.

In the time-triggered control process, the control law $u(x_k)$ is updated at each time step k. This continuous updating control method is easier to achieve system stability. However, it is not optimistic in resource utilization. Conversely, in the ETC process, $u(x_k)$ is generated only when the designed triggering condition is violated. As such, $u(x_k)$ will be maintained constant in the interval when the event is not triggered by introducing a zero-order hold (ZOH). This means that compared with the time-triggered control, the ETC can greatly reduce the computational burden. Therefore, the stability guarantee of the controlled system is essential under the ETM. For clarity, we define $\{{k_j}\}^{\infty }_{j=0}$ $(k_0=0)$ as a sequence consisting of event-based sampling instants. It is worth noting that this sequence is monotonically increasing, i.e., $k_0<k_1\cdots <k_{\infty }$. Then, the event-based control law $\mu (x_{k_j})$ satisfies

$$\begin{aligned} \mu (x_{k_j})=u(x_k), \forall k\in [k_j,k_{j+1}), \end{aligned}$$

(2)

where $x_k$ and $x_{k_j}$ represent the current state and sampling state, respectively.

Remark 1

In general, the design of the ETC scheme is likely to cause the Zeno phenomenon. However, this phenomenon mainly occurs in continuous systems, because we cannot guarantee that the next triggering time must be after the previous triggering time. Therefore, for the ETC of continuous systems, we usually give a theoretical proof to avoid the Zeno phenomenon. We consider that $\{{k_j}\}^{\infty }_{j=0}$ expresses a monotonically increasing sequence of the triggering instant, which means the next triggering time must be after the previous triggering time. Therefore, the Zeno phenomenon will not occur in this paper.

Considering the existence of saturation constraint $\bar{U}$, it can be clearly concluded that $\left| \mu _i(x_{k_j})\right| \le \bar{U}$. We introduce a variable $\sigma _k$ as the triggered interval, which is expressed as

$$\begin{aligned} \sigma _k=x_{k_j}-x_k, \forall k\in [k_j,k_{j+1}). \end{aligned}$$

(3)

Then, system (1) can be redescribed as

$$\begin{aligned} x_{k+1}=\mathscr {F}\big (x_k, \mu (\sigma _k+x_k)\big ), k\in {\mathbb {N}}. \end{aligned}$$

(4)

In order to address the optimal control problem of nonlinear systems with infinite horizon effectively, we need to design a feedback control sequence to minimize the cost function, which is to be optimal. That is

$$\begin{aligned} \mathscr {J}^*(x_k)&=\min \limits _{\mu (\cdot )}\sum _{\ell =k}^{\infty }Z\big (x_{\ell }, \mu (\sigma _{\ell }+x_{\ell })\big )\nonumber \\&=\min \limits _{\mu (\cdot )}\sum _{\ell =k}^{\infty }\big \{x_{\ell }^{\textsf{T}}Qx_{\ell }+W\big (\mu (x_{k_j})\big )\big \}\nonumber \\&= x_k^{\textsf{T}}Qx_k+W\big (\mu ^*(x_{k_j})\big )+\mathscr {J}^*(x_{k+1}), \end{aligned}$$

(5)

where Q is a positive definite matrix and $Z(\cdot , \cdot )\ge 0$ is the utility function. In order to overcome the actuator saturation problem, inspired by [20], we define the nonquadratic function $W\big (\mu (x_{k_j})\big )$ as

$$\begin{aligned} W\big (\mu (x_{k_j})\big )&= 2\bar{U}\int _{0}^{\mu (x_{k_j})}\psi ^{-\textsf{T}}(t/\bar{U})R\text {d}t\nonumber \\&= 2\bar{U} \sum _{i=1}^{m}\int _{0}^{\mu _i(x_{k_j})}\varphi ^{-1}(t_i/\bar{U})r_i\text {d}t_i, \end{aligned}$$

(6)

where $\psi ^{-1}(\mu (\cdot ))=[\varphi ^{-1}(\mu _{1}(\cdot )), \varphi ^{-1}(\mu _{2}(\cdot )),$ $ \dotsc , \varphi ^{-1}(\mu _{m}(\cdot ))]^{\textsf{T}}$ and $\psi (\cdot )\in { \mathbb {R}}^m$, $R=\text {diag}$ $\{r_1, r_2, \dotsc , r_m\}$ is a positive definite matrix, $i=1,2,\dotsc ,m$. In addition, it is worth noting that $\varphi (\cdot )$ is a strictly monotonically increasing odd function and bounded, $|\varphi (\cdot )| \le 1$ and it also belongs to $C^{b}(b \ge 1)$ and $L_{2}(\Omega )$. Then, we can determine that the function $W\big (\mu (x_{k_j})\big )$ is positive definite. For simplicity, according to the characteristics of $\varphi (\cdot )$, we choose $\varphi (\cdot )=\text {tanh}(\cdot )$. Without loss of generality, we assume that the eigenvalues of the matrix R are the same number, i.e., $r_1=r_2=\cdots =r_m=r>0$. Hence, the function $W\big (\mu (x_{k_j})\big )$ can be rewritten as

$$\begin{aligned} W\big (\mu (x_{k_j})\big )=\ 2r\bar{U} \sum _{i=1}^{m}\int _{0}^{\mu _i(x_{k_j})}\text {tanh}^{-1}(t_i/\bar{U})\text {d}t_i. \end{aligned}$$

(7)

Then, the following partial derivative can be easily obtained:

$$\begin{aligned}&\dfrac{\partial W\big (\mu (x_{k_j})\big )}{\partial \mu (x_{k_j})}\nonumber \\&\quad = \dfrac{\partial \bigg (2r\bar{U} \sum \limits _{i=1}^{m}\displaystyle \int _{0}^{\mu _i(x_{k_j})}\text {tanh}^{-1}(t_i/\bar{U})\text {d}t_i\bigg )}{\partial \mu (x_{k_j})}\nonumber \\&\quad = 2r\bar{U}\text {tanh}^{-1}\big (\mu (x_{k_j})/\bar{U}\big ). \end{aligned}$$

(8)

According to the optimality principle, we can obtain the optimal control law $\mu ^*(x_{k_j})$ by solving the following equation:

$$\begin{aligned}&\dfrac{\partial Z\big (x_k, \mu (x_{k_j})\big )}{\partial \mu (x_{k_j})}+\bigg [\dfrac{\partial x_{k+1}}{\partial \mu (x_{k_j})}\bigg ]^{\textsf{T}}\dfrac{\partial \mathscr {J}^*(x_{k+1})}{\partial x_{k+1}}\nonumber \\&\quad =2r\bar{U}\text {tanh}^{-1}\big (\mu (x_{k_j})/\bar{U}\big )+\bigg [\dfrac{\partial x_{k+1}}{\partial \mu (x_{k_j})}\bigg ]^{\textsf{T}}\nonumber \\&\qquad \dfrac{\partial \mathscr {J}^*(x_{k+1})}{\partial x_{k+1}}=0. \end{aligned}$$

(9)

Hence, $\mu ^*(x_{k_j})$ is solved and expressed as

$$\begin{aligned} \mu ^*(x_{k_j})=-\bar{U}\text {tanh}\Bigg (\dfrac{1}{2r\bar{U}}\bigg [\dfrac{\partial x_{k+1}}{\partial \mu (x_{k_j})}\bigg ]^{\textsf{T}}\dfrac{\partial \mathscr {J}^*(x_{k+1})}{\partial x_{k+1}}\Bigg ). \end{aligned}$$

(10)

Observing (5) and (10), we find that $\mathscr {J}^*(x_k)$ and $\mu ^*(x_{k_j})$ can be calculated concretely if the value of $\mathscr {J}^*(x_{k+1})$ is known. However, in fact, it is difficult for nonlinear systems. In addition, in order to effectively improve resource utilization and have favourable control performance, an event-based adaptive critic near-optimal control algorithm is developed.

3 Static/dynamic ETC design

This section consists of two subsections. First subsection, a novel static triggering condition is designed for discrete-time nonlinear systems with control constraints. This triggering condition can ensure the stability of the controlled system when the control constraint is considered. The stability is uncommon for discrete-time nonlinear systems with control constraints. Therefore, it makes sense for this static triggering condition to be developed. Second subsection, we introduce a dynamic variable based on the static triggering condition, and then design a dynamic ETC method.

3.1 Novel static ETC

The main purpose of designing a reasonable event-triggering condition is to determine the sampling instant $k_{j+1}$, $j\in {\mathbb {N}}$. In the static ETM, the selection of sampling instants is related to the current state $x_k$ and the event-triggered interval $\sigma _k$. Overall, the simple frame of the static event-triggered control is displayed in Fig. 1. More specifically, the static ETC scheme is applied to deal with the optimal regulation problem of the system (4). The control law $\mu (x_{k_j})$ is updated at $k_{j+1}$ that can be determined as follows:

$$\begin{aligned} k_{j+1}=\text {inf}\big \{k>k_j | \mathscr {C}(x_k, \sigma _k)>0\big \}, \end{aligned}$$

(11)

where $\mathscr {C}(x_k, \sigma _k)$ is the static triggering threshold we need to design next. Before proceeding, we introduce a common lemma and an useful assumption with the same property as it used in [32, 34].

Lemma 1

For arbitrary vectors $\mathscr {A}$ and $\mathscr {B}$, and a positive constant $\varrho $, the inequality

$$\begin{aligned} \varrho \mathscr {A}^{\textsf{T}}\mathscr {A}+\varrho ^{-1}\mathscr {B}^{\textsf{T}}\mathscr {B}\ge 2\mathscr {A}^{\textsf{T}}\mathscr {B} \end{aligned}$$

(12)

is always true.

Assumption 1

Assume that the control law $u^*(x_k)$ is Lipschitz continuous for all $x_k\in \Omega $. That is, there exists a Lipschitz constant $K_u>0$ such that

$$\begin{aligned} \left\| u^*(x_k)-\mu ^*(x_{k_j}) \right\| \le K_u \left\| x_k-x_{k_j}\right\| =K_u \left\| \sigma _k\right\| . \end{aligned}$$

(13)

Theorem 1

Let $\mathscr {J}^*(x_k)$ be the solution of the HJB equation (5) while Assumption 1 holds. If the static triggering threshold $\mathscr {C}(x_k, \sigma _k)$ satisfies

$$\begin{aligned} \mathscr {C}(x_k, \sigma _k)=\dfrac{K_u^2}{\bar{U}^2}\left\| \sigma _k\right\| ^2-(1-\beta )\lambda _{\min }(Q)\left\| x_k\right\| ^2, \end{aligned}$$

(14)

where $0<\beta <1$ is an adjustable parameter. Then, we can declare that the closed-loop system (4) to be stable in the sense of UUB under the event-based optimal control law $\mu ^*(x_{k_j})$. According to (11), the static triggering condition can be reexpressed as

$$\begin{aligned} \left\| \sigma _k\right\| ^2\le \dfrac{\bar{U}^2(1-\beta )\lambda _{\min }(Q)}{K_u^2}\left\| x_k\right\| ^2. \end{aligned}$$

(15)

Proof

Observing (5), the first-order difference of the optimal cost function $\mathscr {J}^*(x_k)$ satisfies

$$\begin{aligned} \Delta \mathscr {J}^*(x_k)=&\ \mathscr {J}^*(x_{k+1})-\mathscr {J}^*(x_k)\nonumber \\ =&-x_k^{\textsf{T}}Qx_k-W\big (\mu ^*(x_{k_j})\big )\nonumber \\&\quad \le -\lambda _{\min }(Q)\left\| x_k \right\| ^2-W\big (\mu ^*(x_{k_j})\big ). \end{aligned}$$

(16)

According to (7), one has

$$\begin{aligned} -W\big (\mu ^*(x_{k_j})\big ){=}{-}2r\bar{U} \sum _{i=1}^{m}\int _{0}^{\mu _i^*(x_{k_j})}\text {tanh}^{-1}(t_i/\bar{U})\text {d}t_i. \end{aligned}$$

(17)

Let $s_i=t_i/\bar{U}$, $i=1,2,\ldots ,m$. Then, according to variable substitution methods, one has

$$\begin{aligned}&-W\big (\mu ^*(x_{k_j})\big )\nonumber \\&\quad =-2r\bar{U}^2 \sum _{i=1}^{m}\int _{0}^{\dfrac{\mu _i^*(x_{k_j})}{\bar{U}}}\text {tanh}^{-1}(s_i)\text {d}s_i\nonumber \\&\quad =-2r\bar{U}^2 \sum _{i=1}^{m}\Bigg \{\bigg (s_i\text {tanh}^{-1}(s_i)\nonumber \\&\qquad \,\, +\text {ln}\dfrac{\left| 1-s^2_i\right| }{2}\bigg )\bigg |_0^{\dfrac{\mu _i^*(x_{k_j})}{\bar{U}}}\Bigg \}\nonumber \\&\quad =-2r\bar{U}^2 \sum _{i=1}^{m}\Bigg \{\dfrac{\mu _i^*(x_{k_j})}{\bar{U}}\text {tanh}^{-1}\bigg [\dfrac{\mu _i^*(x_{k_j})}{\bar{U}}\bigg ]\nonumber \\&\qquad +\text {ln}\dfrac{\left| 1-\Big [\dfrac{\mu _i^*(x_{k_j})}{\bar{U}}\Big ]^2\right| }{2}-\text {ln}\dfrac{1}{2}\Bigg \}. \end{aligned}$$

(18)

We already know that $\left| \mu _i(x_{k_j})\right| <\bar{U}$ and the expression of the inverse hyperbolic tangent function is $\text {tanh}^{-1}(X)=\dfrac{1}{2}\text {ln}\big ((1+X)/(1-X)\big )$. Therefore, by simplifying (18), we can obtain

$$\begin{aligned}&-W\big (\mu ^*(x_{k_j})\big )\nonumber \\&\quad =-2r\bar{U}^2 \sum _{i=1}^{m}\bigg \{\Big [1+\dfrac{\mu _i^*(x_{k_j})}{2\bar{U}}\Big ]\text {ln}\Big [1+\dfrac{\mu _i^*(x_{k_j})}{\bar{U}}\Big ]\nonumber \\&\qquad +\Big [1-\dfrac{\mu _i^*(x_{k_j})}{2\bar{U}}\Big ]\text {ln}\Big [1-\dfrac{\mu _i^*(x_{k_j})}{\bar{U}}\Big ]\bigg \}. \end{aligned}$$

(19)

Based on the range of $\mu ^*_i(x_{k_j})$, $i=1,2,\ldots ,m$, for the control law $\mu ^*_i(x_{k_j})$ of the same dimension, we have

$$\begin{aligned} \dfrac{\left| \mu ^*_i(x_{k_j})\right| }{2\bar{U}}\le \dfrac{\left| \mu ^*_i(x_{k_j})\right| }{\bar{U}}\le 1, \end{aligned}$$

(20)

which implies

$$\begin{aligned} {\left\{ \begin{array}{ll} 1+\dfrac{\mu _i^*(x_{k_j})}{2\bar{U}}>0\\ 1-\dfrac{\mu _i^*(x_{k_j})}{2\bar{U}}>0 \end{array}\right. } \end{aligned}$$

(21)

for every $\mu ^*_i(x_{k_j})$. In addition, if $\mu ^*_i(x_{k_j})<0$, we can easily know that $\text {ln}\big (1+\mu _i^*(x_{k_j})/\bar{U}\big )<0$. On the contrary, if $\mu ^*_i(x_{k_j})\ge 0$, we have $\text {ln}\big (1+\mu _i^*(x_{k_j})/\bar{U}\big )\ge 0$. Then, in order to further analyze the stability of the closed-loop system (4), we will discuss the following three cases.

Case 1: In this case, we assume that $\kappa $ elements are less than 0 in the event-based optimal control law $\mu ^*(x_{k_j})$, where $\kappa $ is a positive integer and satisfies $1\le \kappa < m$. This indicates that there are $\tau =m-\kappa $ elements that are not less than 0. For clarity, we design a set $\{h_1, h_2,\ldots ,h_{\kappa }\}$ to express the corresponding $\kappa $ elements and design a set $\{b_1, b_2,\ldots ,b_{\tau }\}$ to express the other $\tau $ elements. According to (19)–(21), one has

$$\begin{aligned}&-W\big (\mu ^*(x_{k_j})\big )\nonumber \\&\quad =-2r\bar{U}^2 \sum _{p=1}^{\kappa }\bigg \{\Big [1+\dfrac{\mu _{h_p}^*(x_{k_j})}{2\bar{U}}\Big ]\text {ln}\Big [1+\dfrac{\mu _{h_p}^*(x_{k_j})}{\bar{U}}\Big ]\nonumber \\&\qquad +\Big [1-\dfrac{\mu _{h_p}^*(x_{k_j})}{2\bar{U}}\Big ]\text {ln}\Big [1-\dfrac{\mu _{h_p}^*(x_{k_j})}{\bar{U}}\Big ]\bigg \}\nonumber \\&\qquad -2r\bar{U}^2 \sum _{q=1}^{\tau }\bigg \{\Big [1+\dfrac{\mu _{b_q}^*(x_{k_j})}{2\bar{U}}\Big ]\text {ln}\Big [1+\dfrac{\mu _{b_q}^*(x_{k_j})}{\bar{U}}\Big ]\nonumber \\&\qquad +\Big [1-\dfrac{\mu _{b_q}^*(x_{k_j})}{2\bar{U}}\Big ]\text {ln}\Big [1-\dfrac{\mu _{b_q}^*(x_{k_j})}{\bar{U}}\Big ]\bigg \}\nonumber \\&\quad \le \!-2r\bar{U}^2 \sum _{p=1}^{\kappa }\bigg \{\Big [1+\dfrac{\mu _{h_p}^*(x_{k_j})}{2\bar{U}}\Big ]\text {ln}\Big [1+\dfrac{\mu _{h_p}^*(x_{k_j})}{\bar{U}}\Big ]\bigg \}\nonumber \\&\qquad -2r\bar{U}^2 \sum _{q=1}^{\tau }\bigg \{\Big [1-\dfrac{\mu _{b_q}^*(x_{k_j})}{2\bar{U}}\Big ]\text {ln}\Big [1-\dfrac{\mu _{b_q}^*(x_{k_j})}{\bar{U}}\Big ]\bigg \}\nonumber \\&\quad =\!-2r\bar{U}^2 \sum _{p=1}^{\kappa }\bigg \{\Big [1-\dfrac{|\mu _{h_p}^*(x_{k_j})|}{2\bar{U}}\Big ]\text {ln}\Big [1-\dfrac{|\mu _{h_p}^*(x_{k_j})|}{\bar{U}}\Big ]\bigg \}\nonumber \\&\qquad -2r\bar{U}^2 \sum _{q=1}^{\tau }\bigg \{\Big [1-\dfrac{|\mu _{b_q}^*(x_{k_j})|}{2\bar{U}}\Big ]\text {ln}\Big [1-\dfrac{|\mu _{b_q}^*(x_{k_j})|}{\bar{U}}\Big ]\bigg \}, \end{aligned}$$

(22)

where $p\in \{1,2,\ldots ,\kappa \}$ and $q\in \{1,2,\ldots ,\tau \}$. In addition, after the control input is constrained, the element $|\mu _i^*(x_{k_j})|$ cannot tend to $\bar{U}$ for all i. This means that $-\text {ln}\big (1-|\mu _i^*(x_{k_j})|/\bar{U}\big )$ has an upper bound. Then, we assume that there exists a boundary $\delta _M>0$ such that $-\text {ln}\big (1-|\mu _i^*(x_{k_j})|/\bar{U}\big )\le \delta _M$ for any i. By further derivation, one has

$$\begin{aligned}&-W\big (\mu ^*(x_{k_j})\big )\nonumber \\&\quad \le \ 2r\delta _M\bar{U}^2 \Bigg (\sum _{p=1}^{\kappa }\Big \{1-\dfrac{|\mu _{h_p}^*(x_{k_j})|}{2\bar{U}}\Big \}\nonumber \\&\qquad +\sum _{q=1}^{\tau }\Big \{1-\dfrac{|\mu _{b_q}^*(x_{k_j})|}{2\bar{U}}\Big \}\Bigg )\nonumber \\&\quad = 2r\delta _M\bar{U}^2\sum _{i=1}^{m}\Big \{1-\dfrac{\left| \mu _{i}^*(x_{k_j})\right| }{2\bar{U}}\Big \}. \end{aligned}$$

(23)

Case 2: In this case, we assume that $\mu _i(x_{k_j})<0$ for all i. Then, the equation (19) can be evolved as

$$\begin{aligned}&-W\big (\mu ^*(x_{k_j})\big )\nonumber \\&\quad \le -2r\bar{U}^2 \sum _{i=1}^{m}\bigg \{\Big [1+\dfrac{\mu _i^*(x_{k_j})}{2\bar{U}}\Big ]\text {ln}\Big [1+\dfrac{\mu _i^*(x_{k_j})}{\bar{U}}\Big ]\bigg \}\nonumber \\&\quad =-2r\bar{U}^2 \sum _{i=1}^{m}\bigg \{\Big [1-\dfrac{\left| \mu _i^*(x_{k_j})\right| }{2\bar{U}}\Big ]\text {ln}\Big [1-\dfrac{\left| \mu _i^*(x_{k_j})\right| }{\bar{U}}\Big ]\bigg \}\nonumber \\&\quad \le \ 2r\delta _M\bar{U}^2 \sum _{i=1}^{m}\Big \{1-\dfrac{\left| \mu _i^*(x_{k_j})\right| }{2\bar{U}}\Big \}. \end{aligned}$$

(24)

Case 3: In this case, we assume that $\mu _i(x_{k_j})\ge 0$ for all i. Similarly, we have

$$\begin{aligned}&-W\big (\mu ^*(x_{k_j})\big )\nonumber \\&\quad \le -2r\bar{U}^2 \sum _{i=1}^{m}\bigg \{\Big [1-\dfrac{\mu _i^*(x_{k_j})}{2\bar{U}}\Big ]\text {ln}\Big [1-\dfrac{\mu _i^*(x_{k_j})}{\bar{U}}\Big ]\bigg \}\nonumber \\&\quad \le \ 2r\delta _M\bar{U}^2 \sum _{i=1}^{m}\Big \{1-\dfrac{\left| \mu _i^*(x_{k_j})\right| }{2\bar{U}}\Big \}. \end{aligned}$$

(25)

Combining the above three cases, we know that $-W\big (\mu ^*$ $(x_{k_j})\big )\le 2r\delta _M\bar{U}^2 \sum _{i=1}^{m}\Big \{1-\dfrac{\left| \mu _i^*(x_{k_j})\right| }{2\bar{U}}\Big \}$. Then, according to Lemma 1, we have

$$\begin{aligned}&2r\delta _M\bar{U}^2 \sum _{i=1}^{m}\bigg \{1-\dfrac{|\mu _i^*(x_{k_j})|}{2\bar{U}}\bigg \}\nonumber \\&\quad =\sum _{i=1}^{m}\bigg \{2r\delta _M\bar{U}^2\bigg (1-\dfrac{|\mu _i^*(x_{k_j})|}{2\bar{U}}\bigg )\bigg \}\nonumber \\&\quad \le \sum _{i=1}^{m}\bigg \{r^2\delta ^2_M\bar{U}^4+\bigg (1-\dfrac{|\mu _i^*(x_{k_j})|}{2\bar{U}}\bigg )^2\bigg \}\nonumber \\&\quad =\sum _{i=1}^{m}\bigg \{r^2\delta ^2_M\bar{U}^4+1-\dfrac{|\mu _i^*(x_{k_j})|}{\bar{U}}+\dfrac{\big (\mu _i^*(x_{k_j})\big )^2}{4\bar{U}^2}\bigg \}. \end{aligned}$$

(26)

In addition, we can easily get

$$\begin{aligned} {\left\{ \begin{array}{ll} -\dfrac{|\mu _i^*(x_{k_j})|}{\bar{U}}<-\dfrac{\big (\mu _i^*(x_{k_j})\big )^2}{\bar{U}^2}\\ \dfrac{\big (\mu _i^*(x_{k_j})\big )^2}{4\bar{U}^2}<\dfrac{\bar{U}^2}{4\bar{U}^2}=\dfrac{1}{4}. \end{array}\right. } \end{aligned}$$

(27)

Substituting (27) into (26), one has

$$\begin{aligned}&2r\delta _M\bar{U}^2 \sum _{i=1}^{m}\bigg \{1-\dfrac{|\mu _i^*(x_{k_j})|}{2\bar{U}}\bigg \}\nonumber \\&\quad < \sum _{i=1}^{m}\bigg \{r^2\delta ^2_M\bar{U}^4+\dfrac{5}{4}-\dfrac{\big (\mu _i^*(x_{k_j})\big )^2}{\bar{U}^2}\bigg \}\nonumber \\&\quad = mr^2\delta ^2_M\bar{U}^4+\dfrac{5}{4}m-\dfrac{\mu ^{*\textsf{T}}(x_{k_j})\mu ^{*}(x_{k_j})}{\bar{U}^2}, \end{aligned}$$

(28)

which yields

$$\begin{aligned} \Delta \mathscr {J}^*(x_k)&<\!-\lambda _{\min }(Q)\left\| x_k \right\| ^2+mr^2\delta ^2_M\bar{U}^4+\dfrac{5}{4}m\nonumber \\&\quad \,-\dfrac{\mu ^{*\textsf{T}}(x_{k_j})\mu ^{*}(x_{k_j})}{\bar{U}^2}. \end{aligned}$$

(29)

By splitting, one has

$$\begin{aligned}&-\mu ^{*\textsf{T}}(x_{k_j})\mu ^{*}(x_{k_j})\nonumber \\&\quad =-\Big (u^*(x_k)-\big (u^*(x_k)\nonumber \\&\qquad \,\,-\mu ^*(x_{k_j})\big )\Big )^{\textsf{T}}\Big (u^*(x_k)-\big (u^*(x_k)-\mu ^*(x_{k_j})\big )\Big )\nonumber \\&\quad =-u^{*\textsf{T}}(x_k)u^*(x_k)+2u^{*\textsf{T}}(x_k)\big (u^*(x_k)-\mu ^*(x_{k_j})\big )\nonumber \\&\qquad \,\,-\big (u^*(x_k)-\mu ^*(x_{k_j})\big )^{\textsf{T}}\big (u^*(x_k)-\mu ^*(x_{k_j})\big )\nonumber \\&\quad \le 2u^{*\textsf{T}}(x_k)\big (u^*(x_k)-\mu ^*(x_{k_j})\big ). \end{aligned}$$

(30)

By applying Assumption 1 and Lemma 1, we have

$$\begin{aligned} -\mu ^{*\textsf{T}}(x_{k_j})\mu ^{*}(x_{k_j})&\le u^{*\textsf{T}}(x_k)u^*(x_k)+\big (u^*(x_k)-\mu ^*(x_{k_j})\big )^{\textsf{T}}\nonumber \\&\quad \times \big (u^*(x_k)-\mu ^*(x_{k_j})\big )\nonumber \\&\quad < m\bar{U}^2+K_u^2\left\| \sigma _k\right\| ^2. \end{aligned}$$

(31)

Substituting (31) into (29), one has

$$\begin{aligned} \Delta \mathscr {J}^*(x_k)&<-\lambda _{\min }(Q)\left\| x_k \right\| ^2+mr^2\delta ^2_M\bar{U}^4+\dfrac{9}{4}m\nonumber \\&\quad \,\,+\dfrac{K_u^2}{\bar{U}^2}\left\| \sigma _k\right\| ^2. \end{aligned}$$

(32)

By combining (15) and (32), we can obtain

$$\begin{aligned} \Delta \mathscr {J}^*(x_k)&<-\beta \lambda _{\min }(Q)\left\| x_k \right\| ^2+mr^2\delta ^2_M\bar{U}^4\nonumber \\&\quad \,\,+\dfrac{9}{4}m+\mathscr {C}(x_k, \sigma _k)\nonumber \\&\le -\beta \lambda _{\min }(Q)\left\| x_k \right\| ^2+D_{1}^2, \end{aligned}$$

(33)

where

$$\begin{aligned} D_{1}=\sqrt{mr^2\delta ^2_M\bar{U}^4+\dfrac{9}{4}m}. \end{aligned}$$

(34)

Due to Q is a positive definite matrix, which means that $\lambda _{\min }(Q)>0$. Thus, $\Delta \mathscr {J}^*(x_k)<0$ holds only if the system state $x_k$ satisfies

$$\begin{aligned} \left\| x_k \right\| \ge \dfrac{D_{1}}{\sqrt{\beta \lambda _{\min }(Q)}}. \end{aligned}$$

(35)

This verifies that the event-based closed-loop system (4) is stable in the sense of UUB with the optimal control law $\mu ^*(x_{k_j})$ in (10). This completes the proof. $\square $

3.2 Evolved dynamic ETC

Different from the static ETM, the dynamic ETM needs to introduce a dynamic variable $\zeta _k$. Overall, the simple frame of the dynamic ETC is expressed in Fig. 2. Then, the sampling instant $k_{j+1}$ can be determined by

$$\begin{aligned} k_{j+1}=\text {inf}\big \{k>k_j | \mathscr {C}(x_k, \sigma _k, \zeta _k)>0\big \}, \end{aligned}$$

(36)

where $\mathscr {C}(x_k, \sigma _k, \zeta _k)$ is the dynamic triggering threshold. Similar to the static ETC method, the event is triggered only when the corresponding triggering threshold is greater than zero. Inspired by [29], $\mathscr {C}(x_k, \sigma _k, \zeta _k)$ is defined as

$$\begin{aligned} \mathscr {C}(x_k, \sigma _k, \zeta _k)=\vartheta \mathscr {C}(x_k, \sigma _k)-\zeta _k \end{aligned}$$

(37)

and the auxiliary dynamic variable $\zeta _k$ is specifically defined as

$$\begin{aligned} \zeta _{k+1}=\gamma \zeta _k-\mathscr {C}(x_k, \sigma _k) \end{aligned}$$

(38)

with the variable $\zeta _{k_j}\ge 0$ during the event is not triggered, i.e., $k\in [k_j, k_{j+1})$, where $\gamma \in (0,1)$ and $\vartheta \ge 1/\gamma $ are two positive scalars. This means that when the event is not triggered, the dynamic variable $\zeta _k$ is updated through equation (38), and when the triggering condition is violated, the dynamic variable is reassigned.

Lemma 2

Let $\mathscr {C}(x_k, \sigma _k, \zeta _k)$ and $\zeta _k$ satisfy (37) and (38), respectively. Then, the dynamic variable $\zeta _k$ satisfies

$$\begin{aligned} \zeta _k\ge 0, \end{aligned}$$

(39)

for all $k\in {\mathbb {N}}$.

Proof

According to the dynamic triggering condition designed above, during $k\in [k_j, k_{j+1})$, we have

$$\begin{aligned} \vartheta \mathscr {C}(x_k, \sigma _k)-\zeta _k\le 0, \end{aligned}$$

(40)

which implies

$$\begin{aligned} \mathscr {C}(x_k, \sigma _k)\le \dfrac{1}{\vartheta }\zeta _k. \end{aligned}$$

(41)

Combining (38) and (41) yields

$$\begin{aligned} \gamma \zeta _k-\zeta _{k+1}\le \dfrac{1}{\vartheta }\zeta _k, \end{aligned}$$

(42)

which leads to

$$\begin{aligned} \zeta _{k}\ge \Big (\gamma -\dfrac{1}{\vartheta }\Big )^{k-k_j}\zeta _{k_j}\ge 0. \end{aligned}$$

(43)

This completes the proof. $\square $

It follows from Lemma 2 that the dynamic variable $\zeta _k$ is nonnegative for all $k\in {\mathbb {N}}$. Thus, the system state is sampled only when $\mathscr {C}(x_k, \sigma _k)>\dfrac{1}{\vartheta }\zeta _k$ holds, which is more strict than $\mathscr {C}(x_k, \sigma _k)>0$ in the static ETM. This means that when the static triggering condition is violated, i.e., $\mathscr {C}(x_k, \sigma _k)>0$, the dynamic triggering condition is not necessarily violated. Then, we can infer that the number of events released in the dynamic ETM is less than that in the static ETM.

Theorem 2

Assume that the dynamic variable $\zeta _k$ has an upper bound, i.e., $\zeta _k\le \zeta _M$ for all k. Then, according to the event-based optimal control law $\mu ^{*}(x_{k_j})$ from the closed-loop system (4) with the dynamic triggering condition $\mathscr {C}(x_k,$ $\sigma _k, \zeta _k)\le 0$, we can deduce that the controlled system (4) is UUB.

Proof

According to the theoretical proof in the static ETM, then (33) can be further expressed as

$$\begin{aligned} \Delta \mathscr {J}^*(x_k)&<-\beta \lambda _{\min }(Q)\left\| x_k \right\| ^2+mr^2\delta ^2_M\bar{U}^4+\dfrac{9}{4}m\nonumber \\&\quad \,\,+\mathscr {C}(x_k, \sigma _k, \zeta _k)+(1-\vartheta )\mathscr {C}(x_k, \sigma _k)+\zeta _k\nonumber \\&\le -\beta \lambda _{\min }(Q)\left\| x_k \right\| ^2+mr^2\delta ^2_M\bar{U}^4+\dfrac{9}{4}m\nonumber \\&\quad \,\,+(1-\vartheta )\mathscr {C}(x_k, \sigma _k)+\zeta _k\nonumber \\&\le -\beta \lambda _{\min }(Q)\left\| x_k \right\| ^2+mr^2\delta ^2_M\bar{U}^4+\dfrac{9}{4}m\nonumber \\&\quad \,\,+\Big (1+\dfrac{1}{\vartheta }\Big )\zeta _k-\vartheta \mathscr {C}(x_k, \sigma _k). \end{aligned}$$

(44)

According to the value range of $\vartheta $ and $\gamma $, we have

$$\begin{aligned} -\vartheta \mathscr {C}(x_k, \sigma _k)&=\ \vartheta \zeta _{k+1}-\vartheta \gamma \zeta _k\nonumber \\&\le \ \vartheta \zeta _{k+1}-\zeta _{k}. \end{aligned}$$

(45)

Combining (44) and (45), one has

$$\begin{aligned} \Delta \mathscr {J}^*(x_k)&<-\beta \lambda _{\min }(Q)\left\| x_k \right\| ^2+mr^2\delta ^2_M\bar{U}^4+\dfrac{9}{4}m\nonumber \\&\quad \,\,+\vartheta \zeta _{k+1}+\dfrac{1}{\vartheta }\zeta _k\nonumber \\&\le -\beta \lambda _{\min }(Q)\left\| x_k \right\| ^2+mr^2\delta ^2_M\bar{U}^4+\dfrac{9}{4}m\nonumber \\&\quad \,\,+\Big (\vartheta +\dfrac{1}{\vartheta }\Big )\zeta _M\nonumber \\&=-\beta \lambda _{\min }(Q)\left\| x_k \right\| ^2+D_{2}^2, \end{aligned}$$

(46)

where

$$\begin{aligned} D_{2}=\sqrt{D^2_{1}+\Big (\vartheta +\dfrac{1}{\vartheta }\Big )\zeta _M}. \end{aligned}$$

(47)

Similar to the static ETM, $\Delta \mathscr {J}^*(x_k)<0$ holds only if the system state $x_k$ satisfies

$$\begin{aligned} \left\| x_k \right\| \ge \dfrac{D_{m2}}{\sqrt{\beta \lambda _{\min }(Q)}}. \end{aligned}$$

(48)

This completes the proof. $\square $

Remark 2

It can be seen from Theorems 1 and 2 that if the controlled system (4) is stable, the condition (48) under the dynamic ETM is more strict than the condition (35) under the static ETM. This is mainly because the introduction of the dynamic variable $\zeta _k$ in the dynamic ETM further expands the triggered interval and can also prevent interference. In addition, the static and dynamic triggering conditions developed in this paper are not unique, which will change with the change of the adjustable parameter $\beta $. On the premise that the controlled system is stable, the triggered interval increases with the increase of $\beta $. However, if the choice of $\beta $ is too large, the static/dynamic triggering condition will be difficult to violate and the control law will not be updated for a long time, thus affecting the stability of the system. Therefore, the selection of the adjustable parameter $\beta $ is very important. To make it easier to implement the developed algorithm, the selection value of $\beta $ is small during simulation verification in Section V.

4 Algorithm implementation

The introduction of ETM affects the controlled performance of the system to a certain extent. Therefore, we adopt an integrated idea: (1) a speedy value iteration algorithm under the time-triggered mechanism is used to obtain an acceptable approximate optimal control policy $u^*(x_k)$. In this algorithm, an acceleration factor $\alpha $ is introduced to greatly reduce the number of iterations compared with the traditional value iteration method [10]. Then, we treat the obtained $u^*(x_k)$ as an initial admissible control policy for the controlled system, which aims is to ensure the normal operation of the ETC algorithm. (2) In the time steps, we take the obtained admissible control policy as the starting point, and then build the static/dynamic ETC scheme to reduce the update times of the control law. Therefore, the algorithm we developed possesses high efficiency in terms of the iteration steps and the time steps.

In order to facilitate the research of the speedy value iteration algorithm, the optimal cost function $\mathscr {V}^*(x_k)$ under the time-triggered mechanism is defined as

$$\begin{aligned} \mathscr {V}^*(x_k)=x_k^{\textsf{T}}Qx_k+W\big (u^*(x_k)\big )+\mathscr {V}^*(x_{k+1}). \end{aligned}$$

(49)

Then, the corresponding optimal control policy $u^*(x_k)$ satisfies

$$\begin{aligned} u^*(x_k){=}\arg \min \limits _{u}\Bigl \{x_k^{\textsf{T}}Qx_k{+}W\big (u^*(x_k)\big ){+}\mathscr {V}^*(x_{k+1})\Bigr \}. \end{aligned}$$

(50)

4.1 Traditional value iteration

Before analyzing the traditional value iteration algorithm with constrained control, we set a parameter $l\in {\mathbb {N}}$ as the iteration index. Particularly, when $l=0$, the initial cost function $\breve{\mathscr {V}}^{(0)}(\cdot )$ is not less than 0 [9]. Hence, the entire traditional value iteration scheme is carried out between the policy improvement

$$\begin{aligned} \breve{u}^{(l)}(x_k)&=\arg \min \limits _{\breve{u}}\Bigl \{x_k^{\textsf{T}}Qx_k+W\big (\breve{u}(x_k)\big )\nonumber \\&\quad +\breve{\mathscr {V}}^{(l)}(x_{k+1})\Bigr \}. \end{aligned}$$

(51)

and the cost function

$$\begin{aligned} \breve{\mathscr {V}}^{(l+1)}(x_k)=x_k^{\textsf{T}}Qx_k+W\big (\breve{u}^{(l)}(x_k)\big )+\breve{\mathscr {V}}^{(l)}(x_{k+1}). \end{aligned}$$

(52)

Observing (51) and (52), we find that $\breve{u}^{(l)}(0)=0$ and $\breve{\mathscr {V}}^{(l+1)}(0)$ $=0$ for any l. By using the similar convergence analysis process as [12], we can easily deduce that when l tends to infinity, $\breve{\mathscr {V}}^{(l)}(x_k)=\mathscr {V}^*(x_k)$ and $\breve{u}^{(l)}(x_k)=u^*(x_k)$.

4.2 Speedy value iteration

In order to achieve faster convergence of the iterative cost function, we design a parameter $\alpha \ge 1$ as the acceleration factor. Then, inspired by [12], the speedy value iteration scheme with constrained control is performed in Algorithm 1. It is worth noting that the entire iterative process is carried out under the time-triggered mechanism.

Remark 3

Observing the above two iteration schemes, we can see that when $\alpha =1$, the speedy value iteration scheme is equivalent to the traditional value iteration scheme. Therefore, by appropriately increasing the acceleration factor $\alpha $, the number of iteration steps is greatly reduced when the iterative cost function reaches convergence. However, $\alpha $ should also not be chosen too large, so that the iterative cost function can converge to the optimal value and not be diverge.

In the following, the convergence of the iterative cost function sequence $\{\tilde{\mathscr {V}}^{(l)}(x_k)\}$ is analyzed through a theorem.

Theorem 3

Let the iterative control function $\tilde{u}^{(l)}(x_k)$ and the iterative cost function $\tilde{\mathscr {V}}^{(l)}(x_k)$ be obtained by (53) and (54), respectively. Then, suppose there exist scalars $\eta $, $\xi _1$, and $\xi _2$ such that $0\le \mathscr {V}^{*}(x_{k+1})\le \eta \big (x_k^{\textsf{T}}Qx_k+W(u(x_k))\big )$ and $0\le \xi _1\mathscr {V}^{*}(x_k)\le \tilde{\mathscr {V}}^{(0)}(x_k)$ $\le \xi _2\mathscr {V}^{*}(x_k)$, where $0< \eta < \infty $ and $0\le \xi _1\le 1< \xi _2<\infty $. If the acceleration factor $\alpha $ satisfies

$$\begin{aligned} 1\le \alpha \le 1+\dfrac{\mathscr {T}\mathscr {L}_{\min }}{(1+\eta )(\xi _2-\xi _1)}, \end{aligned}$$

(55)

where $0<\mathscr {T}<1$ is a positive scalar and $\mathscr {L}_{\min }=\min \{1-\xi _1, \xi _2-1\}$, then the iterative cost function $\tilde{\mathscr {V}}^{(l)}(x_k)$ can approximate the optimal cost function $\mathscr {V}^{*}(x_k)$ by

$$\begin{aligned} \bigg [1-&\bigg (1-\dfrac{\alpha -\mathscr {T}}{1+\eta }\bigg )^l(1-\xi _1)\bigg ]\mathscr {V}^{*}(x_k)\le \tilde{\mathscr {V}}^{(l)}(x_k)\nonumber \\&\qquad \qquad \bigg [1+\bigg (1-\dfrac{\alpha -\mathscr {T}}{1+\eta }\bigg )^l(\xi _2-1)\bigg ]\mathscr {V}^{*}(x_k). \end{aligned}$$

(56)

Proof

According to (56), one has

$$\begin{aligned} {\left\{ \begin{array}{ll} \alpha \le 1+\dfrac{\mathscr {T}(1-\xi _1)}{(1+\eta )(\xi _2-\xi _1)}\\ \alpha \le 1+\dfrac{\mathscr {T}(\xi _2-1)}{(1+\eta )(\xi _2-\xi _1)}, \end{array}\right. } \end{aligned}$$

(57)

which implies

$$\begin{aligned} {\left\{ \begin{array}{ll} \xi _2(\alpha -1)\le \xi _1(\alpha -1)+\dfrac{\mathscr {T}(1-\xi _1)}{1+\eta }\\ \xi _1(\alpha -1)\ge \xi _2(\alpha -1)-\dfrac{\mathscr {T}(\xi _2-1)}{1+\eta }. \end{array}\right. } \end{aligned}$$

(58)

Next, according to the mathematical induction, the left-hand side of (56) can be proved. Letting $l=1$, we have

$$\begin{aligned} \tilde{\mathscr {V}}^{(1)}(x_k)&\ge \alpha \dfrac{1+\eta \xi _1}{1+\eta }\min \limits _{\tilde{u}}\Bigl \{x_k^{\textsf{T}}Qx_k\!+\!W\big (\tilde{u}(x_k)\big )\nonumber \\&\qquad +\mathscr {V}^{*}(x_{k+1})\Bigr \}-\xi _2(\alpha -1)\mathscr {V}^{*}(x_k)\nonumber \\&= \bigg (\alpha \dfrac{1+\eta \xi _1}{1+\eta }-\xi _2(\alpha -1)\bigg )\mathscr {V}^{*}(x_k). \end{aligned}$$

(59)

Substituting (58) into (59) leads to

$$\begin{aligned} \tilde{\mathscr {V}}^{(1)}(x_k)&\ge \ \bigg (\alpha \dfrac{1+\eta \xi _1}{1+\eta }-\xi _1(\alpha -1)\nonumber \\&\quad -\dfrac{\mathscr {T}(1-\xi _1)}{1+\eta }\bigg )\mathscr {V}^{*}(x_k)\nonumber \\&=\ \Bigg [\bigg (1-\dfrac{\alpha -\mathscr {T}}{1+\eta }\bigg )\xi _1+\dfrac{\alpha -\mathscr {T}}{1+\eta }\Bigg ]\mathscr {V}^{*}(x_k). \end{aligned}$$

(60)

Same idea as [12], by recursing $l-1$ times, the iterative cost function $\tilde{\mathscr {V}}^{(l)}(x_k)$ satisfies

$$\begin{aligned} \tilde{\mathscr {V}}^{(l)}(x_k)&\ge \ \Bigg [\bigg (1-\dfrac{\alpha -\mathscr {T}}{1+\eta }\bigg )^l\xi _1+\sum _{\rho =0}^{l-1}\nonumber \\&\quad \bigg (1 -\dfrac{\alpha -\mathscr {T}}{1+\eta }\bigg )^{\rho }\dfrac{\alpha -\mathscr {T}}{1+\eta }\Bigg ]\mathscr {V}^{*}(x_k)\nonumber \\&=\ \bigg [1-\bigg (1-\dfrac{\alpha -\mathscr {T}}{1+\eta }\bigg )^l(1-\xi _1)\bigg ]\mathscr {V}^{*}(x_k). \end{aligned}$$

(61)

Similarly, the right half of (56) can be proved by the same method. In particular, when l tends to infinity, one has

$$\begin{aligned}&\lim _{l\rightarrow \infty }\bigg [1-\bigg (1-\dfrac{\alpha -\mathscr {T}}{1+\eta }\bigg )^l(1-\xi _1)\bigg ]\mathscr {V}^{*}(x_k)\nonumber \\&\quad =\lim _{l\rightarrow \infty }\bigg [1+\bigg (1-\dfrac{\alpha -\mathscr {T}}{1+\eta }\bigg )^l(\xi _2-1)\bigg ]\mathscr {V}^{*}(x_k)\nonumber \\&\quad =\mathscr {V}^{*}(x_k), \end{aligned}$$

(62)

which leads to $\tilde{\mathscr {V}}^{(\infty )}(x_k)=\mathscr {V}^{*}(x_k)$. This completes the proof. $\square $

According to the speedy value iteration algorithm, the optimal cost function $\mathscr {V}^{*}(x_k)$ and the corresponding optimal control law $u^*(x_k)$ can be easily obtained with fewer iteration steps. In order to further reduce the computational burden in time steps, the static/dynamic ETC method is introduced and performed in Algorithm 2.

5 Simulation studies

In order to shore up the previous theoretical analysis and further demonstrate the progressiveness of the proposed algorithm, two experimental examples are supplied in this section.

5.1 Example 1

Consider the following inverted pendulum plant:

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{\pi }=\varpi \\ \dot{\varpi }P{=}{-}\dfrac{\text {Mg}\ell _k}{\varkappa _1}\sin (\pi ){-}\dfrac{\varkappa _2}{\varkappa _1}\dot{\pi }+\dfrac{\varkappa _3}{\varkappa _1}\big (\tanh (u_k)+u_k\big ), \end{array}\right. } \end{aligned}$$

(63)

where the system parameters are provided in Table 2. We set the sampling interval $\Delta t=0.1\text {s}$, and then the inverted pendulum plant can be discretized into

$$\begin{aligned}&x_{k+1}\nonumber \\&=\begin{bmatrix} x_{1k}+0.1x_{2k}\\ -0.6125\sin (x_{1k})+0.975x_{2k}+0.125\big (u_{k}+\tanh (u_{k})\big ) \end{bmatrix}, \end{aligned}$$

(64)

where the system state $x_k=[x_{1k},x_{2k}]^{\textsf{T}}=[\pi _k,\varpi _k]^{\textsf{T}}$ and $x_0=[-1,1]^{\textsf{T}}$. Considering that the system model is unknown, inspired by [19, 28], we build a model network to identify the system dynamics. In addition, some important control parameters are listed in Table 3. In this paper, we can get an acceptable control policy through the offline iterative method, which aims to ensure the controlled performance of the system. Then, in order to verify the influence of the acceleration factor on the iterative convergence speed, we choose three different acceleration factors for the experiment, that is, $\alpha =1$, $\alpha =1.5$, and $\alpha =2$. The corresponding evolution curves of the iterative cost function are displayed in Fig. 3. When $\alpha =1$, the developed speedy iterative algorithm is equivalent to the traditional method. Hence, it can be easily seen that with the increase of acceleration factor, the convergence speed of the iterative cost function also increases.

Table 2 Parameter values of the inverted pendulum plant

Full size table

Table 3 Control parameters of two examples

Full size table

According to the parameter values given in Table 3, the static triggering condition can be specifically constructed as

$$\begin{aligned} \left\| \sigma _k\right\| ^2\le \dfrac{2.5^2\times (1-0.3)\times 0.1}{0.6^2}\left\| x_k\right\| ^2. \end{aligned}$$

(65)

Similarly, the dynamic triggering condition can be specifically constructed as

$$\begin{aligned} \left\| \sigma _k\right\| ^2\le \dfrac{2.5^2\times (1-0.3)\times 0.1}{0.6^2}\left\| x_k\right\| ^2+\dfrac{2.5^2}{0.6^2\times 5}\zeta _k. \end{aligned}$$

(66)

Note that the control law can be updated only when the corresponding triggering condition is violated. The dynamic variable $\zeta _k$ is nonnegative, which means that the dynamic triggering condition is more difficult to be violated than the static triggering condition. Then, the state responses under three control schemes are shown in Fig. 4, which implies that all system states can converge to zero preeminently. The control curves under the static ETM and the dynamic ETM are shown in Figs. 5 and 6, respectively. It can be observed that the control curves under these two mechanisms are ladder shaped and the triggered interval under the dynamic ETM is larger than that under the static ETM. In addition, the control inputs are constrained within $[-2.5,2.5]$. Comparing the designed control schemes with the traditional time-triggered control method, the traditional method can not even ensure the stability of the controlled system under the same control parameters. Then, the corresponding sampling numbers are given in Fig. 7. The control input under the static ETM is updated 34 times in 100 time steps. However, it has only been updated 14 times under the dynamic ETM.

5.2 Example 2

Consider the following third-order nonlinear dynamics:

$$\begin{aligned} x_{k+1}&=\begin{bmatrix} x_{1k}+0.1x_{2k}\\ -0.17\sin (x_{1k})+0.98x_{2k}\\ 0.1x_{1k}+0.2x_{2k} \end{bmatrix} \nonumber \\ {}&\quad +\begin{bmatrix} 0\\ 0.1u_{1k}\\ x_{3k}\cos (u_{2k}) \end{bmatrix}, \end{aligned}$$

(67)

where the system state $x_k=[x_{1k},x_{2k},x_{3k}]^{\textsf{T}}$ with $x_0=[0.5,0.5,$ $0.5]^{\textsf{T}}$, the control variable $u_k=[u_{1k},u_{2k}]^{\textsf{T}}$. Similar to Example 1, some important parameter values of this system are listed in Table 3. We also choose three acceleration factors, that is, $\alpha =1$, $\alpha =1.2$, and $\alpha =1.5$, which aims to verify the effectiveness of the speedy iteration algorithm. The corresponding evolution curves of the iterative cost function are shown in Fig. 8.

According to the parameter values given in Table 3, the static and dynamic triggering conditions can be constructed as

$$\begin{aligned} \left\| \sigma _k\right\| ^2\le \dfrac{(1-0.1)\times 0.1}{1}\left\| x_k\right\| ^2 \end{aligned}$$

(68)

and

$$\begin{aligned} \left\| \sigma _k\right\| ^2\le \dfrac{(1-0.1)\times 0.1}{1}\left\| x_k\right\| ^2+\dfrac{1}{1\times 1.2}\zeta _k, \end{aligned}$$

(69)

respectively. Then, the state responses under three control schemes are displayed in Fig. 9. Compared with the traditional time-triggered control scheme, the ETC scheme designed by us also possesses a positive convergence effect. The control curves under the static ETM and the dynamic ETM are given in Figs. 10 and 11, respectively. The control inputs under these two control schemes are constrained within $[-1,1]$. Then, the corresponding sampling numbers are shown in Fig. 12. The control inputs under the static ETM is updated 224 times in 500 time steps. In addition, under the dynamic ETM, the control inputs is updated 199 times in 500 time steps. All the experimental results verify the excellent performance of the proposed control methods.

6 Conclusion

In this paper, in order to address the optimal control problem of discrete-time nonlinear dynamics with control constraints and improve the resource utilization rate effectively, we develop two control schemes: static ETC and dynamic ETC. First, a satisfying static triggering condition is designed from the perspective of stability. Then, on this basis, a dynamic variable is introduced to design a dynamic triggering condition. Note that control laws under different control mechanisms are updated only when the corresponding triggering condition is violated. Moreover, under iterative learning, an acceleration factor is introduced to make the iterative convergence speed faster. Finally, the effectiveness and superiority of the developed schemes are illustrated by two experimental examples. The experimental results show that the reduction in computation load varies with different controlled systems and tuning parameters when the same control method is applied, and the dynamic ETC method can further enhance resource utilization compared to the static ETC method. However, the disadvantage of the two ETC methods designed in this paper is that the corresponding triggering conditions need to be judged continuously. Therefore, in the future work, we will study the self-triggered control methods of nonlinear systems.

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. Mag. 32(6), 76–105 (2012)
Article MathSciNet Google Scholar
You, L., Jiang, X., Li, B., Zhang, X., Yan, H.: Impulsive layered control of heterogeneous multi-agent systems under limited communication. IEEE Trans. Ind. Inf. 20(3), 5014–5021 (2024)
Article Google Scholar
Rao, J., Wang, J., Xu, J., Zhao, S.: Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces. Nonlinear Dyn. 111, 20041–20053 (2023)
Article Google Scholar
You, L., Jiang, X., Zheng, S., Yan, H.: Communication limited hybrid impulsive control of fuzzy time-delay multiagent network. IEEE Trans. Fuzzy Syst. 32(1), 152–159 (2024)
Article Google Scholar
Huo, Y., Wang, D., Qiao, J., Li, M.: Adaptive critic design for nonlinear multi-player zero-sum games with unknown dynamics and control constraints. Nonlinear Dyn. 111, 11671–11683 (2023)
Article Google Scholar
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)
Article Google Scholar
Zhao, M., Wang, D., Qiao, J., Ha, M., Ren, J.: Advanced value iteration for discrete-time intelligent critic control: a survey. Artif. Intell. Rev. 56, 12315–12346 (2023)
Article Google Scholar
Wang, D., Wang, J., Zhao, M., Xin, P., Qiao, J.: Adaptive multi-step evaluation design with stability guarantee for discrete-time optimal learning control. IEEE/CAA J. Autom. Sin. 10(9), 1797–1809 (2023)
Article Google Scholar
Zhao, M., Wang, D., Ha, M., Qiao, J.: Evolving and incremental value iteration schemes for nonlinear discrete-time zero-sum games. IEEE Trans. Cybern. 53(7), 4487–4499 (2023)
Article Google Scholar
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. B Cybern. 38(4), 943–949 (2008)
Article Google Scholar
Liu, D., Wei, Q.: Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans. Syst. Man Cybern. Syst. 45(12), 1577–1591 (2015)
Article Google Scholar
Ha, M., Wang, D., Liu, D.: A novel value iteration scheme with adjustable convergence rate. IEEE Trans. Neural Netw. Learn. Syst. 34(10), 7430–7442 (2023)
Article MathSciNet Google Scholar
Wang, D., Gao, N., Liu, D., Li, J., Lewis, F.L.: Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications. IEEE/CAA J. Autom. Sin. 11(1), 18–36 (2024)
Article Google Scholar
Li, C., Ding, J., Lewis, F.L., Chai, T.: A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems. Automatica 129(109687), 1–9 (2021)
MathSciNet Google Scholar
Ha, M., Wang, D., Liu, D.: Discounted iterative adaptive critic designs with novel stability analysis for tracking control. IEEE/CAA J. Autom. Sin. 9(7), 1262–1272 (2022)
Article Google Scholar
Wang, D., Zhao, M., Ha, M., Qiao, J.: Intelligent optimal tracking with application verifications via discounted generalized value iteration. Acta Autom. Sin. 48(1), 182–193 (2022)
Google Scholar
Yang, Y., Gao, W., Modares, H., Xu, C.Z.: Robust actor-critic learning for continuous-time nonlinear systems with unmodeled dynamics. IEEE Trans. Fuzzy Syst. 30(6), 2101–2112 (2022)
Article Google Scholar
Xu, H., Jagannathan, S., Lewis, F.L.: Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses. Automatica 48(6), 1017–1030 (2012)
Article MathSciNet Google Scholar
Wang, D., Hu, L., Zhao, M., Qiao, J.: Dual event-triggered constrained control through adaptive critic for discrete-time zero-sum games. IEEE Trans. Syst. Man Cybern. Syst. 53(3), 1584–1595 (2023)
Article Google Scholar
Liu, D., Yang, X., Wang, D., Wei, Q.: Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Trans. Cybern. 45(7), 1372–1385 (2015)
Article Google Scholar
Sussmann, H.J., Sontag, E.D., Yang, Y.: A general result on the stabilization of linear systems using bounded controls. IEEE Trans. Autom. Control 39(12), 2411–2425 (1994)
Article MathSciNet Google Scholar
Yang, X., Wei, Q.: Adaptive critic learning for constrained optimal event-triggered control with discounted cost. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 91–104 (2021)
Article Google Scholar
Yang, X., Zhou, B., Mazenc, F., Lam, J.: Global stabilization of discrete-time linear systems subject to input saturation and time delay. IEEE Trans. Autom. Control 66(3), 1345–1352 (2021)
Article MathSciNet Google Scholar
Postoyan, R., Tabuada, P., Nesic, D., Anta, A.: A ramework for the event-triggered stabilization of nonlinear systems. IEEE Trans. Autom. Control 60(4), 982–996 (2015)
Article Google Scholar
Tallapragada, P., Chopra, N.: On event triggered tracking for nonlinear systems. IEEE Trans. Autom. Control 58(9), 2343–2348 (2013)
Article MathSciNet Google Scholar
Vamvoudakis, K.G., Mojoodi, A., Ferraz, H.: Event-triggered optimal tracking control of nonlinear systems. Int. J. Robust Nonlinear Control 14(4), 598–619 (2017)
Article MathSciNet Google Scholar
Wang, D., Zhou, Z., Liu, A., Qiao, J.: Event-triggered robust adaptive critic control for nonlinear disturbed systems. Nonlinear Dyn. 111, 19963–19977 (2023)
Article Google Scholar
Wang, D., Ha, M., Qiao, J.: Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation. IEEE Trans. Autom. Control 65(3), 1272–1279 (2020)
Article MathSciNet Google Scholar
Ge, X., Han, Q., Wang, Z.: A dynamic event-triggered transmission scheme for distributed set-membership estimation over wireless sensor networks. IEEE Trans. Cybern. 49(1), 171–183 (2019)
Article Google Scholar
Batmani, Y., Davoodi, M., Meskin, N.: Event-triggered suboptimal tracking controller design for a class of nonlinear discrete-time systems. IEEE Trans. Ind. Electron. 64(10), 8079–8087 (2017)
Article Google Scholar
Zhang, K., Zhou, B., Zheng, W., Duan, G.: Event-triggered and self-triggered gain scheduled control of linear systems with input constraints. IEEE Trans. Syst. Man Cybern. Syst. 52(10), 6452–6463 (2022)
Article Google Scholar
Vamvoudakis, K.G.: Event-triggered optimal adaptive control algorithm for continuous-time nonlinear systems. IEEE/CAA J. Autom. Sin. 1(3), 282–293 (2014)
Article Google Scholar
Heemels, W.P.M.H., Donkers, M.C.F., Teel, A.R.: Periodic event-triggered control for linear systems. IEEE Trans. Autom. Control 58(4), 847–861 (2013)
Article MathSciNet Google Scholar
Zhu, Y., Zhao, D., He, H., Ji, J.: Event-triggered optimal control for partially unknown constrained-input systems via adaptive dynamic programming. IEEE Trans. Ind. Electron. 64(5), 4101–4109 (2017)
Article Google Scholar

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grants 62222301, 61890930-5, and 62021003; in part by the National Science and Technology Major Project under Grants 2021ZD0112302 and 2021ZD0112301; and in part by the Beijing Natural Science Foundation under Grant JQ19013.

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
Lingzhi Hu, Ding Wang & Junfei Qiao
Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, 100124, China
Lingzhi Hu, Ding Wang & Junfei Qiao
Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing, 100124, China
Lingzhi Hu, Ding Wang & Junfei Qiao
Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing, 100124, China
Lingzhi Hu, Ding Wang & Junfei Qiao

Authors

Lingzhi Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ding Wang
View author publications
You can also search for this author in PubMed Google Scholar
Junfei Qiao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors’ individual contributions. LH: Formal analysis; Validation; Writing—original draft. DW: Investigation; Supervision; Writing—review and editing. JQ: Methodology; Supervision.

Corresponding author

Correspondence to Ding Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hu, L., Wang, D. & Qiao, J. Static/dynamic event-triggered learning control for constrained nonlinear systems. Nonlinear Dyn 112, 14159–14174 (2024). https://doi.org/10.1007/s11071-024-09778-3

Download citation

Received: 08 March 2024
Accepted: 17 May 2024
Published: 12 June 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s11071-024-09778-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Static/dynamic event-triggered learning control for constrained nonlinear systems

Abstract

Similar content being viewed by others

Dynamic Event-triggered Approximate Optimal Control Strategy for Nonlinear Systems

Adaptive Event Triggered Optimal Control for Constrained Continuous-time Nonlinear Systems

Event-Triggered Adaptive Dynamic Programming for Uncertain Nonlinear Systems

Explore related subjects

1 Introduction

2 Problem statement

Remark 1

3 Static/dynamic ETC design

3.1 Novel static ETC

Lemma 1

Assumption 1

Theorem 1

Proof

3.2 Evolved dynamic ETC

Lemma 2

Proof

Theorem 2

Proof

Remark 2

4 Algorithm implementation

4.1 Traditional value iteration

4.2 Speedy value iteration

Remark 3

Theorem 3

Proof

5 Simulation studies

5.1 Example 1

5.2 Example 2

6 Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation