Keywords

1 Introduction

In recent years, the optimal control of multi-agent systems(MASs) has attracted wide attention of scholars. Note that most of the work on ADP is periodically triggered or time-triggered, which leads to the inefficient use of communication and computing resources. This problem will be improved with the introduction of event-triggered control methods. Different from the time-triggered control, the event-triggered control method means that the agent will update its control policy only when the measurement error exceeds the threshold set by the engineer [1]. This will greatly reduce the use of resources. Therefore, it is a meaningful work to combine the ADP method with the event-triggered mechanism [2,3,4]. Sahoo et al. [3] proposed event-based near optimal control for uncertain nonlinear discrete-time systems by using input-output data and ADP method. Wei et al. [4] use critical neural network (NN) updated at the event-triggered instants to approximate the value function and search for the optimal control policy, resulting in aperiodic weight adjustment law to reduce the computational cost.

It should be pointed out that most of the existing work uses the static event-triggered mode, that is, the trigger instant is the time sequence in which the measurement error is equal to or exceeds the threshold [2,3,4]. At the beginning, the measurement error is not easy to meet the static event-triggered conditions, so it can effectively reduce resource consumption [5]. However, as the system approaches the consensus state, the threshold becomes smaller and smaller, causing the static event-triggered condition to be triggered excessively. Therefore, the main motivation of this paper is to develop a more efficient and flexible dynamic event-triggered optimal control based on ADP method.

With the increase of MASs scale and complexity, it may need to be divided into different subnets to achieve multiple consensus values to cope with the changes of the environment. As a generalization of multi-agent consensus control, group consensus is first mentioned in [6]. Thereafter, there are more and more researches on group consensus, such as [7, 8]. In view of the advantages of group consensus, it seems to be a very interesting topic to put group consensus into the optimal control problem. In addition, the increase of system complexity means that the interaction between agents is not only cooperation. Due to the limited resources, competitive interaction is also essential [9]. Moreover, coopetition interaction is not uncommon in the real environment, such as Unmanned Aerial Vehicles [10], railway transportation [11] and so on. However, the work mentioned previously on the optimal control problem has few in relation to coopetition interaction, most of them are single cooperation. Therefore, the coopetition interaction in the optimal control problem is also a research focus of this paper.

To sum up, we will discuss optimal group consensus control for multi-agent systems in the coopetition networks via dynamic event-triggered methods. The main contributions are summarized mainly as follows: (1) Compared with the traditional time-triggered method to investigate the optimal control problem, this paper will use the event-triggered mechanism. Firstly, the static event-triggered condition is proposed, and then it is extended to the dynamic event-triggered condition to solve the excessive trigger problem caused by the static one. (2) A new tracking error control protocol for the MASs in coopetition networks is proposed. And the group consensus of second-order discrete MASs is investigated based on the ADP method in this paper. (3) The weight estimation errors of Actor-Critic neural network can be guaranteed uniformly ultimately bounded (UUB).

2 Preliminaries

2.1 Graph Theory

Consider the discrete-time MASs with one leader and N followers, let \(\mathcal{G} = \left\{ {\mathcal{V},\mathcal{E},\left. \mathcal{A} \right\} } \right. \) be a simple directed communication topology digraph where \(\mathcal{V} = \left\{ {{v_1},...,\left. {{v_N}} \right\} } \right. \) is a set of agents, \(\mathcal{E} = \left\{ {({v_i},\left. {{v_\mathrm{{j}}})|{v_i},{v_i} \in \mathcal{V}} \right\} } \right. \subseteq \mathcal{V} \times \mathcal{V}\) denotes a set of directed edges and \(\mathcal{A} = {\left[ {{a_{ij}}} \right] _{N \times N}}\) represents a weighted adjacency matrix. \({N_i} = \left\{ {j \in } \right. \mathcal{V}|({v_j},{v_i}) \in \left. \mathcal{E} \right\} \) denotes the neighbors set of the agent i. The diagonal matrix \(D = \mathrm{{diag}}\left\{ {{d_1},{d_2},...,{d_N}} \right\} \) is the in-degree matrix of \(\mathcal{G}\). \(\mathcal{B} = \mathrm{{diag}}\left\{ {{b_i},{b_2},...,{b_N}} \right\} \) is pinning matrix.

2.2 Problems Statement

Consider the second-order MASs with one leader and N followers. Then the followers. Dynamics is represented as

$$\begin{aligned} \begin{aligned} \left\{ {\begin{array}{*{20}{c}} {x_{k + 1}^i = Ax_k^i + Bv_k^i}\\ {v_{k + 1}^i = Cv_k^i + {T_i}u_k^i} \end{array}} \right. ,i \in \left\{ {1,2,3,...,N} \right\} \end{aligned} \end{aligned}$$
(1)

where \({x_{i:k}}\), \({u_{i:k}}\) and \({v_{i:k}}\) denote the position state, the control input and the velocity state, respectively. The system matrices A, B, C and \({T_i}\) are supposed to be unknown constant matrices completely. And the dynamics of the leader is given as

$$\begin{aligned} \begin{aligned} \left\{ {\begin{array}{*{20}{l}} {x_{k + 1}^0 = Ax_k^0 + Bv_k^0}\\ {v_{k + 1}^0 = Cv_k^0} \end{array}} \right. \end{aligned} \end{aligned}$$
(2)

where \({x_{0:k}}\) is position state and \({v_{0:k}}\) is velocity state of the leader. Here the leader only generates desired signals that enable the followers to track, but it does not receive signals from other agents.

Define the local neighbor tracking error as

$$\begin{aligned} \begin{aligned} \delta _k^i = \sum \limits _{j \in {N_i}} {{a_{ij}}(} y_k^i - {\varGamma _{ij}}y_k^j) + {b_i}(y_k^i - y_k^0) \end{aligned} \end{aligned}$$
(3)

where \(y_k^i = {\left[ {\begin{array}{*{20}{c}}{x_k^i}&{v_k^i}\end{array}} \right] ^\mathrm{{T}}}\), \(y_k^0 = {\left[ {\begin{array}{*{20}{c}}{x_k^0}&{v_k^0}\end{array}} \right] ^\mathrm{{T}}}\), \({\varGamma _{ij}}\) is the coopetition coefficient. We utilize \({\varGamma _{ij}} < 0\) to denote the competitive interaction between the agent i and j. Otherwise, \({\varGamma _{ij}} > 0\) represents the cooperative interaction.

Notes and Comments. Note that a single equilibrium state can no longer meet the requirements of complex environments and distributed tasks. Therefore, we divide the MASs into different subnets and achieve multiple consensus values by designing a reasonable coopetition coefficient.

Based (1) and (2), we can get

$$\begin{aligned} \left\{ {\begin{array}{*{20}{l}} {y_{k + 1}^i = \varUpsilon y_k^i + {\mathrm{P}_i}u_k^i}\\ {y_{k + 1}^i = \varUpsilon y_k^i} \end{array}} \right. \end{aligned}$$
(4)

where \(\varUpsilon = \left[ {\begin{array}{*{20}{c}}A&{}B\\ 0&{}C\end{array}} \right] \) and \({\mathrm{P}_i} = \left[ {\begin{array}{*{20}{c}}0\\ {{T_i}}\end{array}} \right] \). Based on (4) and (5), we have the iterative local neighbor tracking error as below

$$\begin{aligned} \begin{aligned} \delta _{k + 1}^i =\varUpsilon \delta _k^i + ({d_i} + {b_i}){\mathrm{P}_i}u_k^i - \sum \limits _{j \in {N_i}} {{a_{ij}}{\varGamma _{ij}}} {\mathrm{P}_i}u_k^j \end{aligned} \end{aligned}$$
(5)

Then the global tracking error can be deduced as

$$\begin{aligned} \begin{aligned} {\delta _k}&= (D - A \circ \varGamma ){y_k} + \mathrm{{ }}\mathcal{B}(y - \bar{y}_k^0)\\&= (\bar{L} + \mathcal{B})({y_k} - \bar{y}_k^0) \end{aligned} \end{aligned}$$
(6)

where \(\bar{L} = D - A \circ \varGamma \), \({y_k} = {\left[ {y{{_k^1}^\mathrm{{T}}},...,y{{_k^N}^\mathrm{{T}}}} \right] ^\mathrm{{T}}}\) and \(\bar{y}_k^0 = {\left[ {y{{_k^0}^\mathrm{{T}}},...,y{{_k^0}^\mathrm{{T}}}} \right] ^\mathrm{{T}}}\); \(\varGamma = \left[ {{\varGamma _{ij}}} \right] \) is the coopetition coefficient matrix; \( \circ \) represents Hadamard product; \({\zeta _k} = {y_k} - \bar{y}_k^0\) is defined as global consensus error.

Assumption 1

Suppose that the matrix \(\bar{L}\) satisfies \(\sum \limits _{j \in {N_i}} {{a_{ij}}(1 - {\varGamma _{ij}})} = 0\).

Based on the work of [12] and Assumption 1, we can know \(\mathop {\lim }\limits _{k \rightarrow \infty } \left\| {{\zeta _k}} \right\| = 0\) once \(\mathop {\lim }\limits _{k \rightarrow \infty } \left\| {{\delta _k}} \right\| = 0\) , i.e., the agents in each subnet will be synchronized with their group leader. Therefore, one of our goals is to minimize the global tracking error \({\delta _k}\) by ADP method.

In order to find the optimal control \(u_i^ * \), the discounted value function is designed with the local neighbor tracking errors as

$$\begin{aligned} \begin{aligned} {J_i}\left( {\delta _k^i} \right) \mathrm{{ = }}{U_i}(\delta _k^i,{u_k}) + \alpha {J_i}\left( {\delta _{k + 1}^i} \right) \end{aligned} \end{aligned}$$
(7)

where \({U_i}(\delta _k^i,{u_k}) = \delta _k^{i\mathrm{{T}}}{Q_{ii}}\delta _k^i + u_k^{i\mathrm{{T}}}{R_{ii}}u_k^i + \sum \limits _{j \in {N_i}} {u_k^{j\mathrm{{T}}}{R_{ij}}u_k^j} \) denotes the reward function; \(\alpha \in \left( {0,\left. 1 \right] } \right. \) is the discount factor; \({Q_{ii}} \ge 0\), \({R_{ii}} \ge 0\) and \({R_{ij}} \ge 0\) are positive symmetric weighting matrices.

By Bellman optimality principle, we can know the optimal discounted value function \(J_i^*\left( {\delta _k^i} \right) \) satisfies the following discrete-time Hamilton-Jacobi-Bellman (DT-HJB) equation

$$\begin{aligned} \begin{aligned} J_i^ * \left( {\delta _k^i} \right) = \mathop {\min }\limits _{u_k^i} \left\{ {{U_i}(\delta _k^i,{u_k}) + } \right. \left. {\alpha J_i^ * \left( {\delta _{k + 1}^i} \right) } \right\} \end{aligned} \end{aligned}$$
(8)

Under the necessary condition , we can give the optimal control policy as

$$\begin{aligned} u_k^{i*} = - \frac{\alpha }{2}\left( {{b_i} + {d_i}} \right) R_{ii}^{ - 1}T_i^\mathrm{{T}}\partial J_i^*\left( {\delta _{k + 1}^i} \right) /\partial \delta _{k + 1}^i \end{aligned}$$
(9)

3 Main Results

3.1 Event-Triggered Framework

The traditional ADP method usually adopts periodic sampling, which will cause a huge computation and a waste of resources with the increase of system scale and complexity. Therefore, we introduce event-triggered mechanism into the ADP method to reduce the Sampling frequency. The measurement error is defined as

$$\begin{aligned} \begin{aligned} e_k^i = \delta _k^i - \delta _{k_s^i}^i,\mathrm{{ }}k \in \left[ {k_s^i,k_{s + 1}^i} \right) \end{aligned} \end{aligned}$$
(10)

where \(k_s^i\) is the sth triggering instants of agent i. The monotonically increasing triggering instants sequence is determined by

$$\begin{aligned} \left\{ {\begin{array}{*{20}{l}} {k_1^i = 0}\\ {k_{s + 1}^i = \mathop {\inf }\limits _{l > k_s^i,} \left\{ \begin{array}{l} l:{\left\| {e_k^i} \right\| ^2} - \frac{{1 - 2{F^2}}}{{2{F^2} - {\sigma _i}}}{\left\| {\delta _k^i} \right\| ^2}\\ + \frac{{\sum \limits _{j \in {N_i}} {{{\left\| {{{\hat{\omega }}_{aj}}(k_{s + 1}^j){\psi _{aj}}({z_{aj}}(k_{s + 1}^j))} \right\| }^2}} }}{{2{F^2} - {\sigma _i}}} \ge 0 ,\forall k \in \left( {k_s^i,\left. l \right] } \right. \end{array} \right\} } \end{array}} \right. \end{aligned}$$
(11)

where F and \({\sigma _i}\) are positive constants, \({\hat{\omega }_{aj}}\) is the weight of the actor network for agent j, \({\psi _{aj}}( \cdot )\) is \(tanh( \cdot )\) which denotes the activation function, \({z_{aj}}\) is input vector combine with \({\delta _j}(k_s^j)\).

The above condition (11) is a static event-triggered condition that contains constant threshold parameters that should be determined by the operation engineer or designer. However, this may cause agents to communicate unnecessarily when consensus is approaching. The event-triggered condition (11) is over-triggered. Taking this into account, we propose a dynamic event-triggered law that introduces an external threshold. The triggering instants sequence determined by the following dynamic event-triggered conditions

$$\begin{aligned} \left\{ {\begin{array}{*{20}{l}} {k_1^i = 0}\\ {k_{s + 1}^i = \mathop {\inf }\limits _{l > k_s^i,} \left\{ \begin{array}{l} l:{\theta _i}\left( {{{\left\| {e_k^i} \right\| }^2} - \frac{{1 - 2{F^2}}}{{2{F^2} - {\sigma _i}}}{{\left\| {\delta _k^i} \right\| }^2}} \right. \\ + \left. {\frac{{\sum \limits _{j \in {N_i}} {{{\left\| {{{\hat{\omega }}_{aj}}(k_{s + 1}^j){\psi _{aj}}({z_{aj}}(k_{s + 1}^j))} \right\| }^2}} }}{{2{F^2} - {\sigma _i}}}} \right) \ge \eta _k^i,\forall k \in \left( {k_s^i,\left. l \right] } \right. \end{array} \right\} } \end{array}} \right. \end{aligned}$$
(12)

where F, \({\sigma _i}\), \({\hat{\omega }_{aj}}\), \({\psi _{aj}}( \cdot )\) and \({z_{aj}}\) are defined in (11), \({\theta _i}\) is a positive constants, \(\eta _k^i\) is external threshold satisfying

$$\begin{aligned} \begin{array}{l} \dot{\eta }_k^i\mathrm{{ = }} - {\rho _i}\eta _k^i + {\sigma _i}\left( {\frac{{1 - 2{F^2}}}{{2{F^2} - {\sigma _i}}}{{\left\| {\delta _k^i} \right\| }^2} - {{\left\| {e_k^i} \right\| }^2}} \right. \left. { - \frac{1}{{2{F^2} - {\sigma _i}}}\sum \limits _{j \in {N_i}} {{{\left\| {{{\hat{\omega }}_{aj}}(k_{s + 1}^j){\psi _{aj}}({z_{aj}}(k_{s + 1}^j))} \right\| }^2}} } \right) \end{array} \end{aligned}$$
(13)

with \(\eta _0^i > 0\) and \({\rho _i} > 0\).

Notes and Comments. In (12), \(\eta _k^i\) varies in real time according to the measurement error, local neighbor tracking error and the weight of actor network. Compared with the static event-triggered condition (11), the dynamic one is essentially a static trigger mechanism with “history” information. If \(\eta _k^i\) is set to 0, the dynamic event-triggered condition in (12) becomes a static one, which can be seen as its special case. Note that the external threshold \(\eta _k^i\) will improve the excessive trigger problem caused by static trigger conditions, thus saving communication and computing resources.

3.2 Implementation of the ADP Method

In order to find the optimal discounted value function and control policy, the online-learning policy iteration (PI) algorithm is presented.

PI Algorithm: Let \({k_s} = k = 0\) and \(s = 0\), choose the initial admissible control policy \(u_0^i\) and a small enough positive number \({\iota _i} > 0\).

  1. 1)

    compute the measurement error and tracking error;

  2. 2)

    if the dynamic event-triggered condition (12) is satisfied, compute the iterative local value function by (7).

  3. 3)

    set \(s = s + 1\), \({k_s} = k\). Update the control policy by (10).

  4. 4)

    End until \(\left| {{J_i}\left( {\delta _k^i} \right) - {J_i}\left( {\delta _{k + 1}^i} \right) } \right| < {\iota _i}\), else \(k = k + 1\) and back to step 1;

  5. 5)

    return \(u_{{k_s}}^i\) and \({J_i}\left( {\delta _k^i} \right) \).

Through the continuous iteration of the above algorithm, \({u^*}\) can be get finally. The process of proving the convergence of the above PI algorithm is similar to the work in [13], so it is no longer proved in this paper.

Since the discrete HJB equation is very difficult to resolve, we use the Actor-Critic neural network to obtain an approximate optimal solution of the equation in the PI algorithm. Figure 1 designs the structure diagram of the optimal consensus control network based on event trigger. When the event-triggered condition is satisfied, the current time and state are recorded as sampling time and state respectively, and then sent to the controller. The optimal control policy is approximated by the Actor-Critic network.

Fig. 1.
figure 1

Schematic of event-triggered ADP method.

(1) Critic NN

The discounted value function is approximated by the following critic network

$$\begin{aligned} {\hat{J}_i}(k_s^i) = \hat{\omega }_{ci}^\mathrm{{T}}(k_s^i){\psi _{ci}}({z_{ci}}(k_s^i)) \end{aligned}$$
(14)

where \({\hat{\omega }_{ci}}\) is the weight of critic network; \({z_{ci}}\) is an input vector with \(\delta _{{k_s}}^i\), \(u_{{k_s}}^i\) and \(u_{{k_s}}^{{N_i}}\); \({\psi _{ci}}( \cdot )\) is \(tanh( \cdot )\) which denotes the activation function.

Based on the discounted value function (7) and approximation discounted value function (14), we define the error function as

(15)

where \({\vartheta _{ci}}(k) = {U_i}(\delta _k^i,{u_{{k_s}}}) + \alpha {\hat{J}_i}(k_s^i + 1) - {\hat{J}_i}(k_s^i)\). The goal of our work is to minimize the error function by adjusting the weights of the neural network. The weight update policy based on gradient is derived as

$$\begin{aligned} \begin{array}{l} {{\hat{\omega }}_{ci}}(k) \mathrm{{ = }}\left\{ \begin{array}{l} {{\hat{\omega }}_{ci}}(k - 1) - {\kappa _{ci}}\alpha {\psi _{ci}}({z_{ci}}(k - 1)){\vartheta _{ci}}(k - 1),\mathrm{{ }}k\mathrm{{ = }}k_s^i;\\ {{\hat{\omega }}_{ci}}(k - 1),\mathrm{{ }}k \in \left( {k_s^i,k_{s + 1}^i} \right) \end{array} \right. \end{array} \end{aligned}$$
(16)

(2) Actor NN

The control policy is approximated by the following actor network

$$\begin{aligned} \hat{u}_{{k_s}}^i = \hat{\omega }_{ai}^\mathrm{{T}}(k_s^i){\psi _{ai}}({z_{ai}}(k_s^i)) \end{aligned}$$
(17)

where \(\hat{\omega }_{ai}\), \({\psi _{ai}}( \cdot )\) and \({z_{ai}}\) are defined in (11). Define the output error function with the approximate control policy (17) as

(18)

with \({\vartheta _{ai}}(k) = \hat{u}_k^i - \tilde{u}_k^i\) and \(\tilde{u}_k^i\) is the target of the actor network. The actor network weight update policy can be derived by the gradient descent method as

$$\begin{aligned} \begin{array}{l} {{\hat{\omega }}_{ai}}(k) = \left\{ \begin{array}{l} {{\hat{\omega }}_{ai}}(k - 1) - {\kappa _{ai}}{\psi _{ai}}({z_{ai}}(k - 1))\\ \times \left[ {{{\hat{\omega }}_{ai}}(k - 1){\psi _{ai}}({z_{ai}}(k - 1)) - \tilde{u}_k^i} \right] \mathrm{{, }}k\mathrm{{ = }}k_s^i\mathrm{{; }}\\ {{\hat{\omega }}_{ai}}(k - 1),\mathrm{{ }}k \in \left( {k_s^i,k_{s + 1}^i} \right) \end{array} \right. \end{array} \end{aligned}$$
(19)

3.3 Stability Analysis

We will analyze the stability of the MASs which combines the event-triggered mechanism and the neural network structure in two cases whether the events are triggered or not. In order to facilitate the following analysis, we first define two approximation errors. The critic network weight approximation error \({\tilde{\omega }_{ci}}(k_s^i)\) is define as

$$\begin{aligned} {\tilde{\omega }_{ci}}(k_s^i) = {\hat{\omega }_{ci}}(k_s^i) - \omega _{ci}^ * \end{aligned}$$
(20)

where \(\omega _{ci}^ * \) is the target weight of the critic network. The actor network weight approximation error \({\tilde{\omega }_{ai}}(k_s^i)\) is define as

$$\begin{aligned} {\tilde{\omega }_{ai}}(k_s^i) = {\hat{\omega }_{ai}}(k_s^i) - \omega _{ai}^ * \end{aligned}$$
(21)

where \(\omega _{ai}^ * \) is the target weight of the actor network. Then, when the measurement error of the MASs satisfies the proposed dynamic event-triggered condition (12), we have the following theorem.

Theorem 1

When the dynamic event-triggered condition is satisfied, the MASs updates the control policy of the current agent and the weights of the neural network associated with it. The approximation error \({\tilde{\omega }_{ci}}(k_s^i)\) and \({\tilde{\omega }_{ai}}(k_s^i)\)can be guaranteed UUB.

Proof

Define the following Lyapunov function

$$\begin{aligned} {L_i}(k_s^i) = {L_{i1}}(k_s^i) + {L_{i2}}(k_s^i) \end{aligned}$$
(22)

where \({L_{i1}}(k_s^i) = \tilde{\omega }_{ci}^T(k_s^i){\tilde{\omega }_{ci}}(k_s^i)\) and \({L_{i2}}(k_s^i) = \tilde{\omega }_{ai}^T(k_s^i){\tilde{\omega }_{ai}}(k_s^i)\). According to (15) and (16), we can get

$$\begin{aligned} \begin{array}{l} {{\tilde{\omega }}_{ci}}(k_{s + 1}^i) = {{\hat{\omega }}_{ci}}(k_s^i) - {\kappa _{ci}}\alpha {\psi _{ci}}({z_{ci}}(k)){\vartheta _{ci}}(k) - \omega _{ci}^ * \\ = (1 - {\lambda _1}){{\tilde{\omega }}_{ci}}(k_s^i) - {\varepsilon _1} \end{array} \end{aligned}$$
(23)

where \(\varDelta {\psi _{ci}}({z_{ci}}(k)) = {\psi _{ci}}({z_{ci}}(k + 1)) - {\psi _{ci}}({z_{ci}}(k))\), \({\lambda _1}\) is an eigenvalue of \(\alpha {\kappa _{ci}}{\psi _{ci}}({z_{ci}}(k))(1 - \alpha )\varDelta \psi _{ci}^T({z_{ci}}(k))\) and \({\lambda _1} \in \left( { - \infty ,1} \right) \) can be designed by choosing the appropriate \({\kappa _{ci}}\), and \({\varepsilon _1} = \alpha {\kappa _{ci}}{\psi _{ci}}({z_{ci}}(k))\left( (\alpha - 1)\varDelta \psi _{ci}^T({z_{ci}}(k))\omega _{ci}^ * -\right. {}\) \(\left. {U_i}(\delta _k^i,{u_{{k_s}}}) \right) \). \(\varDelta {\psi _{ci}}({z_{ci}}(k))\) and \({\varepsilon _1}\) are bounded since \({\psi _{ci}}({z_{ci}}(k))\) is bounded. Then the difference of \({L_{i1}}(k_s^i)\) is given by

$$\begin{aligned} \begin{array}{l} \varDelta {L_{i1}}(k_s^i)\\ = {\left( {\left( {1 - {\lambda _1}} \right) {{\tilde{\omega }}_{ci}}(k_s^i) - {\varepsilon _1}} \right) ^\mathrm{{T}}}\left( {\left( {1 - {\lambda _1}} \right) {{\tilde{\omega }}_{ci}}(k_s^i) - {\varepsilon _1}} \right) - \tilde{\omega }_{ci}^T(k_s^i){{\tilde{\omega }}_{ci}}(k_s^i)\\ \le {\sigma _1}^2{\left\| {{{\tilde{\omega }}_{ci}}(k_s^i)} \right\| ^2} + {\left\| {{\varepsilon _1}} \right\| ^2} + \left( {{\lambda _1} - 1} \right) \left( {{{\left\| {{{\tilde{\omega }}_{ci}}(k_s^i)} \right\| }^2} + {{\left\| {{\varepsilon _1}} \right\| }^2}} \right) - {\left\| {{{\tilde{\omega }}_{ci}}(k_s^i)} \right\| ^2}\\ \le \left( {\lambda _1^2 - {\lambda _1} - 1} \right) {\left\| {{{\tilde{\omega }}_{ci}}(k_s^i)} \right\| ^2} + {\lambda _1}{\left\| {{\varepsilon _1}} \right\| ^2} \end{array} \end{aligned}$$
(24)

Based on the work in [2], the following condition can be satisfied when \({\lambda _1}\mathrm{{ = }}0\).

$$\begin{aligned} \varDelta {L_{i1}}(k_s^i) \le - {\left\| {{{\tilde{\omega }}_{ci}}(k_s^i)} \right\| ^2} \end{aligned}$$
(25)

Since \(\left\| {{{\tilde{\omega }}_{ci}}(k_s^i)} \right\| > 0\), \({L_{i1}}(k_s^i) < 0\). Then the difference of \({L_{i2}}(k_s^i)\) is given by

$$\begin{aligned} \begin{aligned} \varDelta {L_{i2}}(k_s^i)&= \tilde{\omega }_{ai}^\mathrm{{T}}(k_{s + 1}^i){{\tilde{\omega }}_{ai}}(k_{s + 1}^i) - \tilde{\omega }_{ai}^\mathrm{{T}}(k_s^i){{\tilde{\omega }}_{ai}}(k_s^i)\\&= \left[ {{{\left( {1 - {\kappa _{ai}}\psi _{ai}^2({z_{ai}}(k_s^i))} \right) }^2} - 1} \right] {\left\| {{{\tilde{\omega }}_{a\mathrm{{i}}}}(k_s^i)} \right\| ^2} \end{aligned} \end{aligned}$$
(26)

Due to \({\kappa _{ai}} \in \left( {0,1} \right) \), \({\psi _{ai}}\left( \cdot \right) \in \left( { - 1,1} \right) \) and \(\left\| {{{\tilde{\omega }}_{a\mathrm{{i}}}}(k_s^i)} \right\| > 0\), the difference \(\varDelta {L_{i2}}(k_s^i) < 0\). Based on (25) and (26), we can know \(\varDelta {L_i}(k_s^i) = \varDelta {L_{i1}}(k_s^i) + \varDelta {L_{i2}}(k_s^i) < 0\). That is to say, \({\tilde{\omega }_{ci}}(k_s^i)\) and \({\tilde{\omega }_{a\mathrm{{i}}}}(k_s^i)\) are UUB and the system is ultimately bounded at the event- triggered instants. The proof is completed.

Assumption 2

There exists a constant \(F > 0\) such that \(\left\| {\delta _{k + 1}^i} \right\| \le F\left\| {\delta _k^i} \right\| + F\left\| {e_k^i} \right\| \).

Under the above assumptions, when the measurement error of the MASs does not meet the proposed dynamic event-triggered condition, we have the following theorem.

Theorem 2

When the measurement error of agent i does not meet the condition (12), the agent will not update its relevant control policy and the weights of neural network. The MASs can be guaranteed to be asymptotically stable if there exist positive scalar \({\sigma _i}\), F, \({\rho _i}\) and \({\theta _i}\) such that

$$\begin{aligned} \left\{ {\begin{array}{*{20}{l}} {{\sigma _i} > 0}\\ {1 - 2{F^2}< 0}\\ {0< 2{F^2} - {\sigma _i} < {\rho _i}{\theta _i}} \end{array}} \right. \end{aligned}$$
(27)

Proof

Consider the following Lyapunov function

$$\begin{aligned} L_i^2(k) = \delta _k^{i\mathrm{{T}}}\delta _k^i + \sum \limits _{j \in {N_i}} {u_k^{j\mathrm{{T}}}} u_k^j + \eta _k^i \end{aligned}$$
(28)

Since the dynamic event-triggered condition (12) is dissatisfied, we can know

$$\begin{aligned} \begin{array}{l} {\theta _i}\left( {{{\left\| {e_k^i} \right\| }^2} - \frac{{1 - 2{F^2}}}{{2{F^2} - {\sigma _i}}}{{\left\| {\delta _k^i} \right\| }^2}} \right. + \frac{1}{{2{F^2} - {\sigma _i}}} \left. { \times \sum \limits _{j \in {N_i}} {{{\left\| {{{\hat{\omega }}_{aj}}(k_{s + 1}^j){\psi _{aj}}({z_{aj}}(k_{s + 1}^j))} \right\| }^2}} } \right) < \eta _k^i \end{array} \end{aligned}$$
(29)

Based on (13), (29) and Assumption 2, the difference of \(L_i^2(k)\) can be derived as

$$\begin{aligned} \begin{array}{l} \varDelta {L_i}(k)\\ = {\left\| {\delta _{k + 1}^i} \right\| ^2} - {\left\| {\delta _k^i} \right\| ^2} + \sum \limits _{j \in {N_i}} {\left( {{{\left\| {u_{k + 1}^j} \right\| }^2} - {{\left\| {u_k^j} \right\| }^2}} \right) } + \dot{\eta }_k^i\\ \le \left( {\frac{{{\sigma _i}(1 - 2{F^2})}}{{2{F^2} - {\sigma _i}}}} \right) {\left\| {\delta _k^i} \right\| ^2} + \frac{{\left( {2{F^2} - {\sigma _i}} \right) - {\rho _i}{\theta _i}}}{{{\theta _i}}}\eta _k^i - \frac{{{\sigma _i}}}{{2{F^2} - {\sigma _i}}}\sum \limits _{j \in {N_i}} {{{\left\| {u_{k + 1}^j} \right\| }^2}} - \sum \limits _{j \in {N_i}} {{{\left\| {u_k^j} \right\| }^2}} \end{array} \end{aligned}$$
(30)

When \({\sigma _i}\), F, \({\rho _i}\) and \({\theta _i}\) satisfy the condition (27), it can be known that \(\varDelta L_i^2(k) \le 0\) and the system is asymptotically stable. The proof is completed.

4 Simulation

We will verify the effectiveness of our proposed ADP method under the dynamic event-triggered mechanism through the simulation experiments of three trigger modes, namely, time trigger, static event trigger and dynamic event trigger. Consider a MASs (4) with the topological structure of Fig. 2. In practical applications, we can abstract each UAV, robot and so on into an agent node. Given the following initial parameters in the event-triggered mechanism. Pinning gain: \({b_1} = 1\), \({b_2} = {b_3} = {b_4} = {b_5} = {b_6} = {b_7} = 0\). Competition coefficients: \({\varGamma _{21}}{{ = }} - \mathrm{{0}}\mathrm{{.1}}\), \({\varGamma _{25}}\,{{ = }}\,2.1\), \({\varGamma _{31}}\,{{ = }}\,1,{\varGamma _{42}}\,{{ = }}\,1,{\varGamma _{54}}\,{{ = }}\,1\), \({\varGamma _{63}}\,\mathrm{{ = }}\,3\), \({\varGamma _{67}}\,{{ = }} - 1\). Discount factor: \(\alpha = 0.95\), \({\kappa _{ci}} = {\kappa _{ai}} = 0.03\). Event-triggered parameters: \({\theta _i}\,{{ = }}\,0.6,{\rho _i} = 0.59, F = 0.729,{\sigma _i} = 0.72\). In the simulation experiments under the three trigger modes, we set the initial position and initial speed of the agent to be the same, and finally the MASs reached the bipartite consensus.

Figure 3 depicts that under the dynamic event-triggered mechanism we designed, the control policies will remain unchanged for a certain period of time because of the threshold we set. And Figs. 4 and 5 show that even if the event trigger reduces the communication of agents, it still reaches the bipartite consensus. Under the event-triggered modes we have proposed, whenever the measurement error exceeds the event-triggered condition (11) and (12), the agent is triggered, and then the control policy is updated. Therefore, compared with the time-triggered mechanism in which every moment is triggered, the event-triggered mechanism proposed in this paper can significantly save the resource consumption. However, it is clear from Fig. 6 that due to the external threshold added to the dynamic event-triggered condition, the effect is better than that of the static event-triggered mechanism.

Fig. 2.
figure 2

Topology of the complex systems.

Fig. 3.
figure 3

Trajectories of the control policies.

Fig. 4.
figure 4

Trajectories of Position States under three trigger modes: (a) time-triggered mode; (b) static event-triggered mode; (c) dynamic event-triggered mode.

Fig. 5.
figure 5

Trajectories of Velocity States under theree trigger modes: (a) time-triggered mode; (b)static event-triggered mode;(c) dynamic event-triggered mode.

Fig. 6.
figure 6

Event-triggered instants of agents under three trigger modes: (a) time-triggered mode; (b) static event-triggered mode; (c) dynamic event-triggered mode.

5 Conclusion

In this paper, the optimal tracking control problem of discrete second-order MASs under event-triggered mechanism is developed. We propose dynamic event-triggered mechanism to solve the problem of excessive triggering caused by static event-triggered conditions. Then, under this mechanism, the optimal control problem is studied by using ADP method, and the stability of the system is proved.