1 Introduction

Traditional period control uses a time-triggered mechanism in which the control law is executed for every fixed time interval. Such a sampling mechanism greatly increases the computing burden and causes computing waste. Compared with the traditional time-triggered control, the event-triggered control can effectively utilize computing resources and reduce the resources waste [1, 2]. The event-triggered control updates the control signal irregularly by setting the trigger condition on the premise of guaranteeing the system performance [3,4,5]. Under this event-triggered mechanism, the control law is only executed when the trigger condition is satisfied.

Event-triggered control has been widely studied and applied in general linear systems, continuous-time systems, discrete-time systems and other fields [6,7,8,9,10,11]. For instance, Zhang et al. [7] proposed a state-feedback event-triggered control method for linear systems. Tabuada [8] introduced an event-triggered mechanism for nonlinear systems, which can make the Lyapunov function decrease strictly along the solution curve. Qiu et al. [10] designed an fuzzy event-triggered control for pure-feedback nonlinear systems with unknown states. Recently, many scholars have applied the event-triggered control to the adaptive dynamic programming (ADP) algorithm [12,13,14,15,16,17,18]. Zhong et al. [15] proposed an event-triggered mechanism for continuous-time nonlinear systems and studied the unknown system dynamic by a NN observer. Literature [16] gave an event-triggered ADP control method and designed a dynamic NN structure to identify the system internal states for continuous-time systems. In [17], Yang et al. used NN structure to reconstruct the angular position and angular velocity signals of the robot arm, and introduced the event-triggered ADP control method to approximate the performance index of the robot.

ADP algorithm is a useful method of iteratively solving the optimal control for various systems, which satisfies the principle of Bellman optimality [19,20,21,22,23]. In 1977, Werbos [19] first proposed the framework of the ADP algorithm. The main idea of this method is to use the function approximation structure (such as neural network, fuzzy model, polynomial, etc.) to approximate the cost function and control law. Subsequently, Murray et al. [20] presented a specific ADP iterative algorithm for continuous systems and gave strict proof of stability and convergence. Prokhorov and Wunsch [24] summarized that the main structures of ADP algorithm can be classified as: heuristic dynamic programming (HDP), dual heuristic dynamic programming (DHP), globalized DHP (GDHP), and their action-dependent structure. On the basis of previous studies, Jiang et al. [21] introduced a novel policy iterative algorithm without relying on the system dynamic. Al-Tamimi et al. [22] proposed an value-iteration-based ADP algorithm for discrete-time nonlinear systems with unknown internal dynamic. In [23], the greedy ADP algorithm was proposed to solve the tracking control problem for discrete-time nonlinear systems by converting the optimal tracking problem into the optimal adjustment problem.

Control constraints widely exist in practical worlds, which can easily damage the overall performance of the system and maybe lead to system instability [25,26,27,28]. As one of the powerful method to solve the optimal control problem for nonlinear systems, ADP method also plays an important role in the system with control constraints. Na et al. [29] proposed a novel online control policy for constrained nonlinear systems based on iterative ADP algorithm. Fan et al. [30] solved the output-constrained optimal control problem for continuous-time nonlinear systems. In addition, researchers have applied iterative ADP event-triggered control to a class of constrained-input systems. For continuous-time constrained nonlinear systems, Zhu et al. [31] introduced an event-triggered optimal control policy and gave the detailed Lyapunov analysis. Literature [32] considered the global stability of the saturated system, and proposed a state-dependent non-quadratic event-triggered control method. In [33], an event-triggered state feedback control policy was provided for constrained linear systems, the positive lower bounds and the self-triggered method were also given.

Recently, scholars have proposed many event-triggered methods for linear systems, but there are few studies that focus on discrete-time nonlinear systems. Besides, for discrete-time nonlinear systems, researches usually use the basic structures (such as HDP and DHP) of ADP algorithm, and do not consider the constrained-input problem. Motivated by this, we propose a novel ADP-based event-triggered approximate optimal control for discrete-time nonlinear systems with control constraints. In this paper, the globalized dual heuristic dynamic programming (GDHP) structure is designed to learn the event-triggered optimal control, in which the information of cost function and its partial derivative are both studied by the critic network. Compared with the HDP and DHP structures, the GDHP structure learns more system information, which enables the GDHP method to obtain better control performance. The contributions of this paper are summarized as follows: (1) the event-triggered design is developed on the GDHP technique for discrete-time nonlinear systems, where the control law generated from action network is updated when the trigger condition is satisfied. (2) The control constraints are well considered in the event-triggered design based on GDHP structure, and the stability of event-triggered constrained systems has been provided.

The rest of this paper is organized as follows. Section 2 introduces the adaptive event-triggered control for a class of discrete-time nonlinear systems with control constraints. The trigger condition and the corresponding stability analysis are given in Sect. 3. The approximate optimal learning method using GDHP structure and the detailed iterative process are discussed in Sect. 4. Section 5 presents three simulation examples to prove the effectiveness of the proposed method. Finally, Sect. 6 gives the conclusion and discussion.

2 Problem formulation

Consider a class of discrete-time nonlinear systems described as

$$\begin{aligned} {x_{k + 1}} = f({x_k}) + g({x_k}){u_k}, \end{aligned}$$
(1)

where \({x_k} = {\left[ {{x_{1k}},{x_{2k}}, \ldots ,{x_{nk}}} \right] ^T}\in {\mathbb {R}^n}\) is the state vector, \({u_k} = {\left[ {{u_{1k}},{u_{2k}}, \ldots ,{u_{mk}}} \right] ^T} \in {\mathbb {R}^m}\) is the control input vector. For any \(x_k\), \(f(x_k):{\mathbb {R}^n} \rightarrow {\mathbb {R}^n}\) is differentiable with \(f(0) = 0\), \(g(x_k):{\mathbb {R}^n} \rightarrow {\mathbb {R}^{n\times m}}\) is nonsingular. Assume that system (1) is Lipschitz continuous on a set \(\varOmega \) in \({\mathbb {R}^n}\) containing the origin, and the system (1) is controllable in the sense that there exists a continuous control on \(\varOmega \) to stabilize the system. Let us define \({\varOmega _u} = \{ u_k|u_k = {[{u_1}_k,{u_2}_k, \ldots ,{u_m}_k]^T} \in {\mathbb {R}^m},\left| {{u_i}_k} \right| \le {{\overline{u}} _i},i = 1,2, \ldots ,m\} \), where \({{\overline{u}} _i}\) is the saturating boundary of the ith executor. \({\overline{U}} \in {\mathbb {R}^{m \times m}}\) is the constant diagonal matrix given by \({\overline{U}} = diag\left[ {{{{\overline{u}} }_1},{{{\overline{u}} }_2}, \ldots ,{{{\overline{u}} }_m}} \right] \).

In the event-triggered control, we set the time \(\{ {k_i}\} _{i = 0}^\infty \) as sampling instant, which means the controller only samples at discrete-time points \({k_0},{k_1},{k_2}, \ldots \). The state feedback control law \(u_k\) satisfies

$$\begin{aligned} u_k = \mu ({x_{k_{i}}}), \end{aligned}$$
(2)

where \({x_{k_{i}}}\) represents the state vector at sampling instant \({k_i} \le k < {k_{i + 1}},\mathrm{{ }}i = 0,1,2, \ldots \). In addition, we design a zero-order-hold (ZOH) device to maintain the control input during the trigger interval. Thus, a continuous control input sequence can be obtained by the ZOH.

Define the event-triggered error as

$$\begin{aligned} {e_k} = {x_{k_{i}}} - {x_k}; ~~ k \in [{k_i},{k_{i + 1}}), \end{aligned}$$
(3)

where \({x_{k_{i}}}\) represents the state at the sampling instant, \(x_k\) represents the current state. By (2) and (3), we can get

$$\begin{aligned} {u_k} = \mu ({e_k} + {x_k}). \end{aligned}$$
(4)

Then, applying (4) into (1), we have

$$\begin{aligned} {x_{k + 1}} = f({x_k}) + g({x_k})\mu ({e_k} + {x_k}). \end{aligned}$$
(5)

The general discrete-time optimal control problem is to find the control law \(u_k\) that can minimize the following infinite domain cost function

$$\begin{aligned} V({x_k}) = \sum \limits _{j = k}^\infty {U({x_j},\mu ({e_j} + {x_j}))}, \end{aligned}$$
(6)

where \(\mu ({e_j} + {x_j}))=\mu ({x_{k_{i}}})\), and \(U({x_j},\mu ({e_j} + {x_j}))\) is the utility function. The utility function is usually a quadratic form which can be described as

$$\begin{aligned} U({x_k},\mu ({x_{k_{i}}})) = x_k^TQ{x_k} + {\mu ^T}({x_{k_{i}}})R\mu ({x_{k_{i}}}), \end{aligned}$$
(7)

where Q and R are symmetric positive definite matrices with appropriate dimensions, and \(U(0,0) = 0\). However, such a quadratic utility function is not suitable for the system with control constrains. Thus, a non-quadratic form is provided to solve the constrained-input problem, and the utility function becomes

$$\begin{aligned}&U({x_k},\mu ({x_{k_{i}}})) = x_k^TQ{x_k}\nonumber \\&\quad + 2{\int _0^{\mu ({x_{k_{i}}})}\varphi ^{ - T}}\left( {{{{\overline{U}}}^{-1}}\tau } \right) {\overline{U}} R\mathrm{d}\tau , \end{aligned}$$
(8)

where \(\tau \in {\mathbb {R}^m}\), \(\varphi ( \cdot )\in {\mathbb {R}^m}\) is a bounded one-to-one function satisfying \(\left| {\varphi ( \cdot )} \right| \le 1\). Here, \(U({x_k},\mu ({x_{k_{i}}}))\) is denoted by \(U_k\).

Based on Bellman optimality principle, we can obtain the optimal cost function \({V^*}({x_k})\) as

$$\begin{aligned} V^*(x_k)=\mathop {\min }\limits _{\mu (x_{k_{i}})}\left\{ U_k+V^*(x_{k+1})\right\} . \end{aligned}$$
(9)

In addition, the control law \({u_{k}}\) satisfies the first-order necessary condition of optimal control [34]. For \(k \in [{k_i},{k_{i + 1}})\), \(i = 0,1,2 \ldots \), the optimal control law \({{\mu }^{*}}(x_{k_i})\) can be obtained as

$$\begin{aligned}&{{\mu }^{*}}(x_{k_i}) =\underset{\mu (x_{k_{i}})}{\mathop {\arg \min }}\,\left\{ U_k+V^*(x_{k+1})\right\} \nonumber \\&\quad = \overline{U}\varphi \left( -\frac{1}{2}{{\left( \overline{U}R \right) }^{-1}}{{g}^{T}}\left( {{x}_{k}} \right) \frac{\partial {{V}^{*}}({{x}_{k+1}})}{\partial {{x}_{k+1}}} \right) . \end{aligned}$$
(10)

Next, we will give an event-triggered condition and prove the corresponding stability for system (5).

3 Stability proof under the event-triggered condition

Definition 1

(cf. [35]) For \(\forall {x_k} \in \varOmega \), a control law \(u_k\) is admissible with respect to (6) on \(\varOmega \) if \(u_k\) is continuous and stabilizes (1) on \(\varOmega \), \(u_k = 0\) if \(x_k = 0\), and \(\forall {x_0} \in \varOmega \), \(V(x_0)\) is finite. For the constant diagonal matrix \({\overline{U}} \in {\mathbb {R}^{m \times m}}\), let \(m=1\), there is \({\overline{u}}= {\overline{U}}\) with \(\left\| {{\mu _{{k_i}}}} \right\| \le {\overline{u}}\).

For system (5), we define an event-triggered condition \(\left\| {{e_k}} \right\| \le {e_T}\), where \(e_T\) is the trigger threshold. During the event-triggered control, the action network will update the corresponding control law only if the trigger condition is satisfied. Besides, at each sampling instant \(k_i\), \(i = 0,1,2 \ldots \), the trigger error \(\left\| {{e_k}} \right\| \) will be reset to zero.

Assumption 1

(cf. [36]) If function V: \({\mathbb {R}^n} \rightarrow {\mathbb {R}^n}\ge 0\) is continuously differentiable, the state vector \(x_k\) and the trigger error \(e_k\) satisfy

$$\begin{aligned}&\left\| {f({x_k}-{e_k})} \right\| \le P_1\left\| {{e_k}} \right\| + P_1\left\| {{x_k}} \right\| , \end{aligned}$$
(11)
$$\begin{aligned}&\left\| {g({x_k}-{e_k})} \right\| \le P_2\left\| {{e_k}} \right\| + P_2\left\| {{x_k}} \right\| , \end{aligned}$$
(12)
$$\begin{aligned}&{\alpha _1}(\left\| x \right\| ) \le V({x_k}) \le {\alpha _2}(\left\| x \right\| ),\mathrm{{ }}~~~\forall x \in {\mathbb {R}^n} \end{aligned}$$
(13)
$$\begin{aligned}&V\left( x_{k+1} \right) - V({x_k})\le - \alpha V({x_k}) + \beta \left\| {{e_k}} \right\| , \end{aligned}$$
(14)
$$\begin{aligned}&\alpha _1^{ - 1}(\left\| x \right\| ) \le {L}\left\| x \right\| . \end{aligned}$$
(15)

where L, \(P_1\), \(P_2\), \(\alpha \) and \(\beta \) are the positive constants, \(\alpha _1 \) and \(\alpha _2 \) are the class \({\kappa _\infty }\) functions.

Among them, if (13) and (14) hold, function V is called an input-to-state stability Lyapunov (ISS-Lyapunov) function [37].

According to (3), for each \(k \in [{k_i},{k_{i + 1}})\), we have

$$\begin{aligned} e_{k + 1} = x_{k_i} - x_{k + 1}, \end{aligned}$$
(16)

where \({k_i}\) is the latest sampling instant. In addition, according to [36], we can get

$$\begin{aligned} \left\| {{e_{k + 1}}} \right\| \le \left\| {{x_{k + 1}}} \right\| . \end{aligned}$$
(17)

From Assumption 1, by applying (3) and (5) into (17), we have

$$\begin{aligned}&\left\| e_{k+1} \right\| \le \left\| f(x_k)+ g(x_k)\mu (x_{k_i})\right\| \nonumber \\&\quad \le \left\| f(x_k)\right\| +\left\| g(x_k) \right\| \overline{u}\nonumber \\&\quad =\left\| f(x_{k_i}-e_k)\right\| +\left\| g(x_{k_i}-e_k) \right\| \overline{u}\nonumber \\&\quad \le \,(P_1+P_2\overline{u})\left\| x_{k_i} \right\| +(P_1+P_2\overline{u})\left\| e_k \right\| \nonumber \\&\quad \le \,(P_1+P_2\overline{u})\left\| x_{k_i} \right\| +2(P_1+P_2\overline{u})\left\| e_k \right\| . \end{aligned}$$
(18)

Therefore, we can obtain

$$\begin{aligned}&\left\| {{e_k}} \right\| \le 2(P_1+P_2\overline{u})\left\| {{e_{k - 1}}} \right\| +(P_1+P_2\overline{u})\left\| {{x_{k_{i}}}} \right\| \nonumber \\&\quad \le 2(P_1+P_2\overline{u})(2(P_1+P_2\overline{u})\left\| {{e_{k - 2}}} \right\| \nonumber \\&\quad + (P_1+P_2\overline{u})\left\| {{x_{k_{i}}}} \right\| ) + (P_1+P_2\overline{u})\left\| {{x_{k_{i}}}} \right\| \cdots \nonumber \\&\quad \le {(2(P_1+P_2\overline{u}))^{k - {k_i}}}\left\| {{e_{k_{i}}}} \right\| \nonumber \\&\quad + {(2(P_1+P_2\overline{u}))^{k - {k_i} - 1}} (P_1+P_2\overline{u})\left\| {{x_{k_{i}}}} \right\| \nonumber \\&\quad + {(2(P_1+P_2\overline{u}))^{k - {k_i} - 2}} (P_1+P_2\overline{u})\left\| {{x_{k_{i}}}} \right\| \nonumber \\&\quad + \cdots + (P_1+P_2\overline{u})\left\| {{x_{k_{i}}}} \right\| . \end{aligned}$$
(19)

Set the initial condition \({e_{k_{i}}} = 0\), Eq. (19) can be solved as

$$\begin{aligned} \left\| {{e_k}} \right\| \le \frac{{1 - {{(2(P_1+P_2\overline{u}))}^{k - k_{i}}}}}{{1 - 2(P_1+P_2\overline{u})}}(P_1+P_2\overline{u})\left\| {{x_{k_{i}}}} \right\| . \end{aligned}$$
(20)

So, we define (20) as the event-triggered condition, such that

$$\begin{aligned} \left\| {{e_k}} \right\|&\le {e_T}\nonumber \\&\quad = \frac{{1 - {{(2(P_1+P_2\overline{u}))}^{k - k_{i}}}}}{{1 - 2(P_1+P_2\overline{u})}}(P_1+P_2\overline{u})\left\| {{x_{k_{i}}}} \right\| . \end{aligned}$$
(21)

In the following, we will give the stability proof of the system (5) with control constraints under the condition (21).

Theorem 1

According to Assumption 1, if \(0\le P_1+P_2\overline{u}\le 1\) and the function \(V({x_k})\) satisfies

$$\begin{aligned} V({x_k})&\le V({x_{k_i + 1}})\nonumber \\&\quad = -\xi \alpha V({x_{k_{i}}})({k_{i + 1}} - {k_i}) + V({x_{k_{i}}}), \end{aligned}$$
(22)

for \(k \in [{k_i},{k_{i + 1}})\), \(i = 0,1,2 \ldots \), where \(\xi \in (0,1)\), the event-triggered control system (5) with control constraints is asymptotically stable.

Proof

By (13) and (15), one can get

$$\begin{aligned} \left\| {{x_{k_{i}}}} \right\| \le \alpha _1^{ - 1}(V({x_{k_{i}}})) \le LV({x_{k_{i}}}). \end{aligned}$$
(23)

Substituting (20) into (14), one has

$$\begin{aligned}&V(x_{k+1}) - V({x_k})\le - \alpha V({x_k}) \nonumber \\&\quad + \beta \frac{{1 - {{(2(P_1+P_2\overline{u}))}^{k - {k_i}}}}}{{1 - 2(P_1+P_2\overline{u})}}(P_1+P_2\overline{u})\left\| {{x_{k_{i}}}} \right\| . \end{aligned}$$
(24)

Then, considering (22) and (23), one can define

$$\begin{aligned} \psi _k = \beta \frac{{1 - {{(2(P_1+P_2\overline{u}))}^{k - {k_i}}}}}{{1 - 2(P_1+P_2\overline{u})}}L(P_1+P_2\overline{u}). \end{aligned}$$
(25)

Equation (24) can be written as

$$\begin{aligned} V({x_{k + 1}}) \le (1 - \alpha )V({x_k}) + {\psi _k}V({x_{k_{i}}}). \end{aligned}$$
(26)

Therefore, it obtains

$$\begin{aligned} V({x_k}) \le&\, (1 - \alpha )V({x_{k - 1}}) + {\psi _{k - 1}}V({x_{k_{i}}})\nonumber \\ \le&\, (1 - \alpha )(1 - \alpha )V({x_{k - 2}}) + {\psi _{k - 2}}V({x_{k_{i}}})\nonumber \\&\,+ {\psi _{k - 1}}V({x_{k_{i}}}) \cdots \nonumber \\ \le&\,{(1 - \alpha )^{k - {k_i}}}V({x_{k_{i}}}) \nonumber \\ {}&\,+{(1 - \alpha )^{k - {k_i} - 1}}{\psi _{k_{i}}}V({x_{k_{i}}}) + \cdots \nonumber \\&\, + (1 - \alpha ){\psi }_{k - 2}V({x_{k_{i}}}) + {\psi _k}V({x_{k_{i}}}). \end{aligned}$$
(27)

According to theorem 1, \({\psi _k}\) is a monotonically increasing function with positive common ratio. Then, (27) can be solved as

$$\begin{aligned} V({x_k}) \le&\,(1 - \alpha {)^{k - {k_i}}}V({x_{k_{i}}}) \nonumber \\&\quad + {\psi _k}\frac{{1 - {{(1 - \alpha )}^{k - {k_i}}}}}{\alpha }V({x_{k_{i}}}). \end{aligned}$$
(28)

Based on (22), one has

$$\begin{aligned} \begin{aligned} {V{({x_k})}} \le - \xi \alpha V({x_{k_{i}}})(k - {k_i})+ V({x_{k_{i}}}). \end{aligned} \end{aligned}$$
(29)

To simplify the calculation, one can define

$$\begin{aligned} \begin{aligned} {M {({x_k})}} = - \xi \alpha V({x_{k_{i}}})(k - {k_i})+ V({x_{k_{i}}}). \end{aligned} \end{aligned}$$
(30)

Therefore, one has

$$\begin{aligned} V({x_k}) \le M ({x_k}). \end{aligned}$$
(31)

From (30), the first difference of \(M({x_k})\) can be obtained as

$$\begin{aligned} \varDelta M&= M ({x_{k + 1}}) - M ({x_k})\nonumber \\&\quad = - \xi \alpha V({x_{k_{i}}}). \end{aligned}$$
(32)

Then, substituting (13) into (32), it obtains

$$\begin{aligned} \varDelta M \le - \xi \alpha \cdot {\alpha _1}(x\left\| {{k_i}} \right\| ) < 0. \end{aligned}$$
(33)

This completes the proof. \(\square \)

From the above derivation process, it can be concluded that the event-triggered control system (5) with control constrains is asymptotically stable.

4 Event-triggered controller design based on GDHP structure

In this section, we will apply the event-triggered condition into the GDHP structure. Due to the actual optimal control law cannot be obtained in theory, an iterative stopping criterion is designed to obtain the approximate optimal control law. Only when the stopping condition is satisfied, an iterative process is completed. Then, the trigger error between the sampled state and the current state is compared with the trigger threshold online. When the designed trigger condition is violated, the current control law will be sampled. Otherwise, the control law is maintained by a ZOH.

This section is divided into three parts. Firstly, an approximate optimal event-triggered controller is designed in the first part. Then, the specific implementation process of NN is given in the second part. Subsequently, the iterative stopping criterion is provided in the third part.

4.1 Event-triggered controller design

The event-triggered controller design based on GDHP structure is displayed in Fig. 1. It can be seen that the proposed event-triggered GDHP method is implemented by three NN structures, where the model network is designed to obtain the system dynamic, the critic network is used to approximate the cost function and its partial derivative, and the action network is designed to obtain the event-triggered approximate optimal control law.

In addition, a sensor device is designed to judge whether the trigger condition is violated. When the trigger condition is violated, the current time is set to the sampling instant \({k_i}\), \(i = 1,2, \ldots \), and the control law \(\mu ({x_{k_i}})\) is maintained by the ZOH device during \({k_i} \le k < {k_{i + 1}}\).

Fig. 1
figure 1

Event-triggered control system based on the GHDP structure

4.2 NN implementation of GDHP technique

4.2.1 Model network

For the unknown system, before performing the iterative calculation, a model network is first constructed to obtain the system dynamic. The number of hidden layer neurons is set as \({N_m}\). Let \({\upsilon _{m}}\) denote the weight matrix of input-to-hidden layer, and \({\omega _{m}}\) denote the weight matrix of hidden-to-output layer. According to the state vector \({x_k}\) and the control law \( \mu ({x_k})\), the state vector \({{\widehat{x}}_{k + 1}}\) for the next time step can be obtained as

$$\begin{aligned} {{\widehat{x}}_{k + 1}} = \omega _{m}^T\phi (\upsilon _{m}^T{x_{mk}}), \end{aligned}$$
(34)

where \(\phi ( \cdot ) \in {R^{{N_m}}}\) is the activation function, which satisfies

$$\begin{aligned} \phi (a) = \frac{{1 - \exp ( - a)}}{{1 + \exp ( - a)}}. \end{aligned}$$
(35)

Set the error function as

$$\begin{aligned} {e_{mk}} = {{\widehat{x}}_{k + 1}} - {x_{k + 1}} \end{aligned}$$
(36)

and the objective error function as

$$\begin{aligned} {E_{mk}} = \frac{1}{2}e_{mk}^T{e_{mk}}. \end{aligned}$$
(37)

In the iterative training of the model network, the weights are updated based on the gradient descent rule, which are

$$\begin{aligned} \omega _{m{(k + 1)}} =&\,\omega _{mk} - {\vartheta _m}\left[ \frac{{\partial {E_{mk}}}}{{\partial \omega _{mk}}}\right] , \end{aligned}$$
(38)
$$\begin{aligned} \upsilon _{m{(k + 1)}} =&\, \upsilon _{mk} - {\vartheta _m}\left[ \frac{{\partial {E_{mk}}}}{{\partial \upsilon _{mk}}}\right] , \end{aligned}$$
(39)

where \({\vartheta _m}\) is the learning rate. Notice that the weights are kept unchanged after a sufficient training and will be used in the following training.

4.2.2 Critic network

It is well known that the critic network of the HDP structure learns the information of the cost function \({V}({x_k})\) and the critic network of the DHP structure learns the information of the partial derivatives \({{\partial {V}({x_k})}/ {\partial {x_k}}}\) of the cost function. However, in the GDHP structure, the critic network not only learns the information of the cost function \({V}({x_k})\), but also studies the knowledge of its partial derivatives \({{\partial {V}({x_k})} / {\partial {x_k}}}\). Because the GDHP structure learning more information about the system, the more excellent control performance can be obtained by this method. For simplicity, the partial derivative is denoted as \({\lambda }({x_k}) = {{\partial {V}({x_k})} / {\partial {x_k}}}\).

In the critic network, the outputs can be obtained as

$$\begin{aligned} {{\widehat{V}}}({x_{k_{i}}}) =&\,\omega _{c}^{VT}\phi (\upsilon _{c}^T{x_{k_{i}}}), \end{aligned}$$
(40)
$$\begin{aligned} {{\widehat{\lambda }}}({x_{k_{i}}}) =&\, \omega _{c}^{\lambda T}\phi (\upsilon _{c}^T{x_{k_{i}}}), \end{aligned}$$
(41)

where \({\upsilon _c}\) represents the weight matrix of input-to-hidden layer, and \({\omega _c}\) represents the weight matrix of hidden-to-output layer. The target function can be expressed as

$$\begin{aligned} {V}({x_{k_{i}}}) =&\,U_k + {{\widehat{V}}}({{\widehat{x}}_{k_{i} + 1}}), \end{aligned}$$
(42)
$$\begin{aligned} {\lambda }({x_{k_{i}}}) =&\,2Q{x_k} + 2\left( \frac{\partial {{\mu }}({{x}_{{{k}_{i}}}})}{\partial {{x}_{{{k}_{i}}}}} \right) \overline{U}R{{\varphi }^{-1}}({{\overline{U}}^{-1}}{{\widehat{\mu }}}({{x}_{{{k}_{i}}}}))\nonumber \\ {}&+ {(\frac{{\partial {{{\widehat{x}}}_{k_{i} + 1}}}}{{\partial {x_{k_{i}}}}} + \frac{{\partial {{{\widehat{x}}}_{k_{i} + 1}}}}{{\partial {{{\widehat{\mu }} }}({x_{k_{i}}})}}\frac{{\partial {{{\widehat{\mu }} }}({x_{k_{i}}})}}{{\partial {x_{k_{i}}}}})^T} \nonumber \\ {}&\times {{\widehat{\lambda }}}({{\widehat{x}}_{k_{i} + 1}}). \end{aligned}$$
(43)

Hence, the error function can be obtained as

$$\begin{aligned} e_{ck}^V =&{{\widehat{V}}}({x_{k_{i}}}) - {V}({x_{k_{i}}}), \end{aligned}$$
(44)
$$\begin{aligned} e_{ck}^\lambda =&\, {{\widehat{\lambda }} }({x_{k_{i}}}) - {\lambda }({x_{k_{i}}}). \end{aligned}$$
(45)

The minimized objective error function is

$$\begin{aligned} {E_{ck}} = (1 - \rho )(\frac{1}{2}e_{ck}^{VT}e_{ck}^V) + \rho (\frac{1}{2}e_{ck}^{\lambda T}e_{ck}^V). \end{aligned}$$
(46)

According to (46) and the gradient descent rule, the weights of the critic network are updated as

$$\begin{aligned} \omega _{c(k+1)} =&\,\omega _{ck} - {\vartheta _c}\left[ (1 - \rho )\frac{{\partial E_{ck}^V}}{{\partial \omega _{ck}}} + \rho \frac{{\partial E_{ck}^\lambda }}{{\partial \omega _{ck}}}\right] , \end{aligned}$$
(47)
$$\begin{aligned} \upsilon _{c(k+1)} =&\,\upsilon _{ck} - {\vartheta _c}\left[ (1 - \rho )\frac{{\partial E_{ck}^V}}{{\partial \upsilon _{ck}}} + \rho \frac{{\partial E_{ck}^\lambda }}{{\partial \upsilon _{ck}}}\right] , \end{aligned}$$
(48)

where \({\vartheta _c} > 0\) is the learning rate. \(0 \le \rho \le 1\) is a constant that reflects the weights of HDP and DHP in GDHP structure, which means that when \(\rho = 0\), the structure reduces to a pure HDP, and when \(\rho = 1\), the used structure reduces to a pure DHP.

4.2.3 Action network

In the action network, the sampled state \(x_{k_i}\) is taken as the input and the obtained \({{\widehat{\mu }} }(x_{k_i})\) is used to approximate \({{\mu }}({{x}_{{{k}_{i}}}})\). With the activation function \(\phi ( \cdot ) \in {R^{{N_m}}}\), \({{\widehat{\mu }} }(x_{k_i}) \) can be formulated as

$$\begin{aligned} {{\widehat{\mu }} }(x_{k_i}) = \omega _{a}^T\phi (\upsilon _{a}^T{x_{k_i}}), \end{aligned}$$
(49)

where \({\upsilon _a}\) represents the weight matrix of input-to-hidden layer, and \({\omega _a}\) represents the weight matrix of hidden-to-output layer. According to (10), the target control input at the sampling instant \(k_i\) can be obtained as

$$\begin{aligned} {{\mu }}({{x}_{{{k}_{i}}}})=\overline{U}\varphi \left( -\frac{1}{2}{{\left( \overline{U}R \right) }^{-1}}{{g}^{T}}\left( {{x}_{k_i}} \right) \frac{\partial {{V}^{*}}({{x}_{k_i+1}})}{\partial {{x}_{k_i+1}}} \right) . \end{aligned}$$
(50)

Define the error function \({e_{ak}}\) and the objective error function \({E_{ak}}\) as

$$\begin{aligned} {e_{ak}} =&\,{{\widehat{\mu }} }(x_{k_i})-{{\mu }}({{x}_{{{k}_{i}}}}), \end{aligned}$$
(51)
$$\begin{aligned} {E_{ak}} =&\,\frac{1}{2}e_{ak}^T{e_{ak}}. \end{aligned}$$
(52)

Similarly, the weights updating rule of the action network can be formulated as

$$\begin{aligned} \omega _{a(k + 1)} =&\,\omega _{ak} - {\vartheta _a}\left[ \frac{{\partial {E_{ak}}}}{{\partial \omega _{ak}}}\right] , \end{aligned}$$
(53)
$$\begin{aligned} \upsilon _{a(k + 1)} = \,&\upsilon _{ak} - {\vartheta _a}\left[ \frac{{\partial {E_{ak}}}}{{\partial \upsilon _{ak}}}\right] , \end{aligned}$$
(54)

where \({\vartheta _a} > 0\) is the learning rate.

4.3 Approximate optimal algorithm design

Define \({V_\infty }({x_k}) = \mathop {\lim }\limits _{l \rightarrow \infty } {V_l}({x_k})\), if the system state \(x_k\) is controllable, the cost function \({V_\infty }({x_k})\) is equal to the optimal cost function \({V^*}({x_k})\)

$$\begin{aligned} \mathop {\lim }\limits _{l \rightarrow \infty } {V_l}({x_k}) = {V^*}({x_k}), \end{aligned}$$
(55)

where l is the outer loop iteration index. As \(l \rightarrow \infty \), we have \({V_l}({x_k}) \rightarrow {V^*}({x_k})\) in theory. However, it is not possible to perform iterations indefinitely in the actual calculation process. Thus, we introduce an error \(\varepsilon \) to make the cost function \(V(x_k)\) converge after a finite number of iterations [38]. That is to say, there is a limited l that can make the cost function \({V_l}({x_k})\) satisfy

$$\begin{aligned} \left| {{V^*}({x_k}) - {V_l}({x_k})} \right| \le \varepsilon . \end{aligned}$$
(56)

In the iterative ADP algorithm, this design achieves the purpose of approximate optimal regulation. However, the optimal cost function \({V^*}({x_k})\) is unknown in general, it is difficult to use the termination criterion (56) to verify whether the iterative algorithm satisfies the requirements. So, we use the following criterion

$$\begin{aligned} \left| {{V_{l + 1}}({x_k}) - {V_l}({x_k})} \right| \le \varepsilon \end{aligned}$$
(57)

to replace (56).

5 Simulation results and analysis

In this section, the event-triggered GDHP method is applied in three discrete-time systems. Simulations show the advantages of the proposed method by comparing with the traditional GDHP method.

5.1 Case 1: two-dimensional system

Consider the following discrete-time system:

$$\begin{aligned} {{x}_{k+1}}=\left[ \begin{matrix} {{x}_{1k}+0.1{{x}_{2k}}} \\ -2{{x}_{1k}}+0.7{{x}_{2k}} \\ \end{matrix} \right] +\left[ \begin{matrix} 0 \\ x_{1k} \\ \end{matrix} \right] {{u}_{k}}, \end{aligned}$$
(58)

where \({x_k} = {\left[ {{x_{1k}},{x_{2k}}} \right] ^T} \in {\mathbb {R}^2}\) is the state vector and \( {{u}_{k}}\in \mathbb {R}\) is the control input vector. The initial state vector is set as \(x_0=[-1,1]\) and the boundary of the saturated actuator is chosen as \(\left| u \right| \le 0.1\). Let the parameters \(Q=I_2\) and \(R=I\), where the subscript represents the dimensions of the identity matrix. Based on (21), we set \(P_1+P_2\overline{u}=0.2\). The event-triggered threshold can be obtained as

$$\begin{aligned} {e_T} = \frac{{1 - {{0.4}^{k - {k_i}}}}}{{1 - 0.4}} \cdot 0.2\left\| {{x_{{k_i}}}} \right\| . \end{aligned}$$
(59)

The model network needs to be pre-trained to obtain the system dynamic before implementing the proposed method. The structure of model network is designed as 3–8–2 and the learning rate is set as \({\vartheta _m=0.1}\). It is well known that the setting of parameters will affect the convergence speed and control effect of the algorithm to a certain extent. In order to obtain a good control performance, the initial weights of the three networks are randomly selected from [\(-\,0.1\), 0.1] after multiple experiments, which enables the algorithm to have a high control accuracy. To obtain sufficient system dynamic, 500 sets of data are randomly selected from \([-\,1,1]\) for training. Model network is trained for 50 time steps on each sample and the training performance is shown in Fig. 2. From Fig. 2, we can see that the training error has a large value at the beginning, as the samples increase, the training error becomes smaller and eventually converges to zero. After the training, the obtained weights will be kept unchanged for the following training.

Next, the critic network and the action network are designed with structures 2–8–3 and 2–8–1. The initial weights of the two networks are all randomly generated within \([-\,0.5, 0.5]\) and the adjusting parameter is set as \(\rho = 0.5\). The learning rates are chosen as \({\vartheta _c}=0.01\) and \({\vartheta _a=0.1}\). During iterative process, each network is trained for 200 inner-loop steps with each iteration of 4000 training steps. In addition, we set \(\varepsilon ={10^{ - 6}}\) as the termination condition for each state, which ensures the control law u is an approximate optimal control. As shown in Fig. 3, the weight norms of the two networks are all convergent after the training. It should be noted that due to the weights of Fig. 3 are updated on line. During an iterative process Of state variables, the weights are updated with the iterative calculation until the termination criterion is satisfied. In the entire iterative update process, it can be seen that the iteration is performed on 4000 training steps.

Fig. 2
figure 2

The training result and the state error of model network

Fig. 3
figure 3

The norm of weight matrix for action and critic networks with GDHP method

Fig. 4
figure 4

State trajectories with the traditional GHDP method and the event-triggered GHDP method

In order to prove the effectiveness of the proposed method, the traditional GDHP method is also applied in this example to make a contrast. The state trajectories and the control input curves in 200 time steps under the two methods are shown in Figs. 4 and 5. As can be seen in Fig. 5, compared with the traditional GDHP method, our method overcomes the constrained-input problem. The trigger error \(\left\| {{e_k}} \right\| \) and the trigger threshold \({e_T}\) are given in Fig. 6. The error \(\left\| {{e_k}} \right\| \) between the current state and the sampled state is calculated in each sampling process, if \(\left\| {{e_k}} \right\| > {e_T}\), \(\left\| {{e_k}} \right\| \) will be reset to zero. In this simulation, the action network of traditional GHDP method needs 200 samples to update the control input, while the event-triggered GDHP method only needs 82 samples.

Fig. 5
figure 5

Control input trajectories u with the traditional GHDP method and the event-triggered GHDP method

Fig. 6
figure 6

Trigger error and trigger threshold

5.2 Case 2: three-dimensional system

Consider the following discrete-time affine nonlinear system presented in [39]:

$$\begin{aligned} {x_{k + 1}} = \left[ {\begin{array}{*{20}{c}} {{x_{1k}}{x_{2k}}}\\ {x_{1k}^2 - 0.5\sin ({x_{2k}})}\\ {{x_{3k}}} \end{array}} \right] + \left[ {\begin{array}{*{20}{c}} 0\\ 1\\ { - 1} \end{array}} \right] {u_k}, \end{aligned}$$
(60)

where the state vector \({x_k} = {\left[ {{x_{1k}},{x_{2k}},{x_{3k}}} \right] ^T} \in {\mathbb {R}^3}\). Set the initial state vector as \({{x}_{0}}={{\left[ -0.5,0.5,1 \right] }^{T}}\) and the saturating boundary as \(\left| u \right| \le 0.1\). The performance index function is chosen the same as case 1 with \(Q=I_n\) and \(R=I_m\). The termination condition error is set as \(\varepsilon ={{10}^{-4}}\). In this case, we let \(P_1+P_2\overline{u}=0.1\). So, the trigger threshold can be obtained as

$$\begin{aligned} {e_T} = \frac{{1 - {{0.2}^{k - {k_i}}}}}{{1 - 0.2}} \cdot 0.1\left\| {{x_{{k_i}}}} \right\| . \end{aligned}$$
(61)
Fig. 7
figure 7

State trajectories x with the traditional GHDP method and the event-triggered GHDP method

The structures of model, critic and action networks are designed as 4–8–3, 3–8–1, and 3–8–1. All the initial weights of three networks are randomly set in [-0.1, 0.1], and other parameters are designed the same as case 1. Similar to case 1, the model network is pre-trained at first and the unchanged weights are used to train the critic network and action network.

In this case, we also compare the traditional GHDP method with the event-triggered GDHP method. The state trajectories in 100 time steps under the two different methods are shown in Fig. 7. According to the state trajectories, we can get that the two methods have similar performance. Figure 8 shows the control input trajectories under the two methods. As can be seen from Fig. 8, the proposed method overcomes the control constraints as well as reduces the computing resources. The event-triggered error and trigger threshold are shown in Fig. 9, where we can see that the action network of traditional GHDP method is updated 100 times, while in the proposed method, the action network is only updated 34 times. Besides, the updated weights process of the action network from hidden-to-output layer are shown in Fig. 10. Due to the weights in Fig. 10 obtained through the external iteration, which are only updated at the sampling instant, the iterative time steps of the action network are the same as the time steps of state trajectory.

Fig. 8
figure 8

Control input trajectories u with the traditional GHDP method and the event-triggered GHDP method

Fig. 9
figure 9

Trigger error and trigger threshold

Fig. 10
figure 10

The weights of action network from hidden-to-output layer

5.3 Case 3: torsional pendulum system

In this case, we apply the event-triggered GDHP method to the dynamical system of torsional pendulum, whose mechanical model is shown in Fig. 11. The mathematical description of this system is as follows [40]:

$$\begin{aligned} \left\{ \begin{aligned}&\frac{\mathrm{d}\theta }{\mathrm{d}t}=\omega \\&G\frac{\mathrm{d}\omega }{\mathrm{d}t}=u-Mgl\sin \theta -{{f}_{d}}\frac{\mathrm{d}\theta }{\mathrm{d}t}, \end{aligned} \right. \end{aligned}$$
(62)

where \(M=1/3\) kg is the mass, \(l=2/3\) m and \(G=4/3M{{l}^{2}}\) are the length of the pendulum bar and the rotary inertia, respectively. Let \({{f}_{d}}=0.2\) be the frictional factor and \(g=9.8\) \(m/{{s}^{2}}\) be the acceleration of gravity. The angle \(\theta \) and the angular velocity \(\omega \) are the inputs of the system.

Fig. 11
figure 11

The mechanical model of the torsional pendulum

Using the sampling interval \(\varDelta t=0.1\) s, the dynamic function of the torsional pendulum system can be discretized as

$$\begin{aligned}{{x}_{k+1}}=\left[ \begin{matrix} 0.1{{x}_{2k}}+{{x}_{1k}} \\ -0.49\sin ({{x}_{1k}})-0.1{{f}_{d}}\cdot {{x}_{2k}}+{{x}_{2k}} \\ \end{matrix} \right] +\left[ \begin{matrix} 0 \\ 0.1 \\ \end{matrix} \right] {{u}_{k}},\end{aligned}$$

where \({{x}_{1k}}=\theta \) and \({{x}_{2k}}=\omega \). The initial state and the control constraint are set as \({{x}_{0}}={{\left[ -1,1 \right] }^{T}}\), \(\left| u \right| \le 0.3\), respectively. The structures of model, critic and action networks are designed as 3–8–2, 2–8–1, and 2–8–1. The termination condition error is set as \(\varepsilon ={{10}^{-3}}\). Besides, the trigger threshold is chosen the same as case 2 and all other parameters are the same as case 1.

In this case, the traditional GHDP method and the proposed method are applied in torsional pendulum system. The state trajectories in 100 time steps under the two method are shown in Fig. 12, which indicates the control performance of the proposed method is similar to that of the traditional GHDP method. However, the event-triggered GDHP method is able to constrain the control input to a certain range as shown in Fig. 13. To further illustrate the effectiveness of the proposed algorithm, the trigger threshold and the event-triggered error are given in Fig. 14. Compared with the traditional time-triggered method with the requirement of 100 samples, the event-triggered method only needs 34 samples, representing a save of 66%.

Fig. 12
figure 12

State trajectories with the traditional GHDP method and the event-triggered GHDP method

Fig. 13
figure 13

Control input trajectories u with the traditional GHDP method and the event-triggered GHDP method

Fig. 14
figure 14

Trigger error and trigger threshold

6 Conclusion

In this paper, a novel event-triggered method based on GDHP technique is proposed for a class of discrete-time systems with control constraints. In order to solve the constrained-input problem, a non-quadratic performance index is introduced in the utility function. Additionally, we give a trigger threshold and use Lyapunov technique to prove the stability of the event-triggered systems. Then, the NN implementation based on GDHP technique is given and an iterative termination criterion is also designed to obtain the approximate optimal control. Finally, three cases are given to demonstrate the effectiveness of the event-triggered GDHP method by comparing with the traditional GDHP method. According to the simulation results from case 1, case 2 and case 3, it can be known that compared with the traditional time-triggered method, the event-triggered method saves 59%, 66% and 66% of computing resources, respectively. Therefore, it can be concluded that the event-triggered GDHP method can solve the constrained-input problem and reduce computing resources while ensuring system performance.