Abstract
In this paper, the optimal group consensus control of second-order multi-agent systems is investigated by using the adaptive dynamic programming (ADP) method under the event-triggered mechanism. In order to meet the needs of group consensus, a novel tracking error protocol involving coopetition interaction is proposed. On this basis, the optimal group consensus problem is established by using the Bellman optimality principle. Then, the dynamic event-triggered mechanism is introduced into the ADP method to save computing and communication resources. The stability of the system is proved by using Lyapunov stability theory. Finally, the simulation results show the effectiveness of the proposed method.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In recent years, the optimal control of multi-agent systems(MASs) has attracted wide attention of scholars. Note that most of the work on ADP is periodically triggered or time-triggered, which leads to the inefficient use of communication and computing resources. This problem will be improved with the introduction of event-triggered control methods. Different from the time-triggered control, the event-triggered control method means that the agent will update its control policy only when the measurement error exceeds the threshold set by the engineer [1]. This will greatly reduce the use of resources. Therefore, it is a meaningful work to combine the ADP method with the event-triggered mechanism [2,3,4]. Sahoo et al. [3] proposed event-based near optimal control for uncertain nonlinear discrete-time systems by using input-output data and ADP method. Wei et al. [4] use critical neural network (NN) updated at the event-triggered instants to approximate the value function and search for the optimal control policy, resulting in aperiodic weight adjustment law to reduce the computational cost.
It should be pointed out that most of the existing work uses the static event-triggered mode, that is, the trigger instant is the time sequence in which the measurement error is equal to or exceeds the threshold [2,3,4]. At the beginning, the measurement error is not easy to meet the static event-triggered conditions, so it can effectively reduce resource consumption [5]. However, as the system approaches the consensus state, the threshold becomes smaller and smaller, causing the static event-triggered condition to be triggered excessively. Therefore, the main motivation of this paper is to develop a more efficient and flexible dynamic event-triggered optimal control based on ADP method.
With the increase of MASs scale and complexity, it may need to be divided into different subnets to achieve multiple consensus values to cope with the changes of the environment. As a generalization of multi-agent consensus control, group consensus is first mentioned in [6]. Thereafter, there are more and more researches on group consensus, such as [7, 8]. In view of the advantages of group consensus, it seems to be a very interesting topic to put group consensus into the optimal control problem. In addition, the increase of system complexity means that the interaction between agents is not only cooperation. Due to the limited resources, competitive interaction is also essential [9]. Moreover, coopetition interaction is not uncommon in the real environment, such as Unmanned Aerial Vehicles [10], railway transportation [11] and so on. However, the work mentioned previously on the optimal control problem has few in relation to coopetition interaction, most of them are single cooperation. Therefore, the coopetition interaction in the optimal control problem is also a research focus of this paper.
To sum up, we will discuss optimal group consensus control for multi-agent systems in the coopetition networks via dynamic event-triggered methods. The main contributions are summarized mainly as follows: (1) Compared with the traditional time-triggered method to investigate the optimal control problem, this paper will use the event-triggered mechanism. Firstly, the static event-triggered condition is proposed, and then it is extended to the dynamic event-triggered condition to solve the excessive trigger problem caused by the static one. (2) A new tracking error control protocol for the MASs in coopetition networks is proposed. And the group consensus of second-order discrete MASs is investigated based on the ADP method in this paper. (3) The weight estimation errors of Actor-Critic neural network can be guaranteed uniformly ultimately bounded (UUB).
2 Preliminaries
2.1 Graph Theory
Consider the discrete-time MASs with one leader and N followers, let \(\mathcal{G} = \left\{ {\mathcal{V},\mathcal{E},\left. \mathcal{A} \right\} } \right. \) be a simple directed communication topology digraph where \(\mathcal{V} = \left\{ {{v_1},...,\left. {{v_N}} \right\} } \right. \) is a set of agents, \(\mathcal{E} = \left\{ {({v_i},\left. {{v_\mathrm{{j}}})|{v_i},{v_i} \in \mathcal{V}} \right\} } \right. \subseteq \mathcal{V} \times \mathcal{V}\) denotes a set of directed edges and \(\mathcal{A} = {\left[ {{a_{ij}}} \right] _{N \times N}}\) represents a weighted adjacency matrix. \({N_i} = \left\{ {j \in } \right. \mathcal{V}|({v_j},{v_i}) \in \left. \mathcal{E} \right\} \) denotes the neighbors set of the agent i. The diagonal matrix \(D = \mathrm{{diag}}\left\{ {{d_1},{d_2},...,{d_N}} \right\} \) is the in-degree matrix of \(\mathcal{G}\). \(\mathcal{B} = \mathrm{{diag}}\left\{ {{b_i},{b_2},...,{b_N}} \right\} \) is pinning matrix.
2.2 Problems Statement
Consider the second-order MASs with one leader and N followers. Then the followers. Dynamics is represented as
where \({x_{i:k}}\), \({u_{i:k}}\) and \({v_{i:k}}\) denote the position state, the control input and the velocity state, respectively. The system matrices A, B, C and \({T_i}\) are supposed to be unknown constant matrices completely. And the dynamics of the leader is given as
where \({x_{0:k}}\) is position state and \({v_{0:k}}\) is velocity state of the leader. Here the leader only generates desired signals that enable the followers to track, but it does not receive signals from other agents.
Define the local neighbor tracking error as
where \(y_k^i = {\left[ {\begin{array}{*{20}{c}}{x_k^i}&{v_k^i}\end{array}} \right] ^\mathrm{{T}}}\), \(y_k^0 = {\left[ {\begin{array}{*{20}{c}}{x_k^0}&{v_k^0}\end{array}} \right] ^\mathrm{{T}}}\), \({\varGamma _{ij}}\) is the coopetition coefficient. We utilize \({\varGamma _{ij}} < 0\) to denote the competitive interaction between the agent i and j. Otherwise, \({\varGamma _{ij}} > 0\) represents the cooperative interaction.
Notes and Comments. Note that a single equilibrium state can no longer meet the requirements of complex environments and distributed tasks. Therefore, we divide the MASs into different subnets and achieve multiple consensus values by designing a reasonable coopetition coefficient.
where \(\varUpsilon = \left[ {\begin{array}{*{20}{c}}A&{}B\\ 0&{}C\end{array}} \right] \) and \({\mathrm{P}_i} = \left[ {\begin{array}{*{20}{c}}0\\ {{T_i}}\end{array}} \right] \). Based on (4) and (5), we have the iterative local neighbor tracking error as below
Then the global tracking error can be deduced as
where \(\bar{L} = D - A \circ \varGamma \), \({y_k} = {\left[ {y{{_k^1}^\mathrm{{T}}},...,y{{_k^N}^\mathrm{{T}}}} \right] ^\mathrm{{T}}}\) and \(\bar{y}_k^0 = {\left[ {y{{_k^0}^\mathrm{{T}}},...,y{{_k^0}^\mathrm{{T}}}} \right] ^\mathrm{{T}}}\); \(\varGamma = \left[ {{\varGamma _{ij}}} \right] \) is the coopetition coefficient matrix; \( \circ \) represents Hadamard product; \({\zeta _k} = {y_k} - \bar{y}_k^0\) is defined as global consensus error.
Assumption 1
Suppose that the matrix \(\bar{L}\) satisfies \(\sum \limits _{j \in {N_i}} {{a_{ij}}(1 - {\varGamma _{ij}})} = 0\).
Based on the work of [12] and Assumption 1, we can know \(\mathop {\lim }\limits _{k \rightarrow \infty } \left\| {{\zeta _k}} \right\| = 0\) once \(\mathop {\lim }\limits _{k \rightarrow \infty } \left\| {{\delta _k}} \right\| = 0\) , i.e., the agents in each subnet will be synchronized with their group leader. Therefore, one of our goals is to minimize the global tracking error \({\delta _k}\) by ADP method.
In order to find the optimal control \(u_i^ * \), the discounted value function is designed with the local neighbor tracking errors as
where \({U_i}(\delta _k^i,{u_k}) = \delta _k^{i\mathrm{{T}}}{Q_{ii}}\delta _k^i + u_k^{i\mathrm{{T}}}{R_{ii}}u_k^i + \sum \limits _{j \in {N_i}} {u_k^{j\mathrm{{T}}}{R_{ij}}u_k^j} \) denotes the reward function; \(\alpha \in \left( {0,\left. 1 \right] } \right. \) is the discount factor; \({Q_{ii}} \ge 0\), \({R_{ii}} \ge 0\) and \({R_{ij}} \ge 0\) are positive symmetric weighting matrices.
By Bellman optimality principle, we can know the optimal discounted value function \(J_i^*\left( {\delta _k^i} \right) \) satisfies the following discrete-time Hamilton-Jacobi-Bellman (DT-HJB) equation
Under the necessary condition , we can give the optimal control policy as
3 Main Results
3.1 Event-Triggered Framework
The traditional ADP method usually adopts periodic sampling, which will cause a huge computation and a waste of resources with the increase of system scale and complexity. Therefore, we introduce event-triggered mechanism into the ADP method to reduce the Sampling frequency. The measurement error is defined as
where \(k_s^i\) is the sth triggering instants of agent i. The monotonically increasing triggering instants sequence is determined by
where F and \({\sigma _i}\) are positive constants, \({\hat{\omega }_{aj}}\) is the weight of the actor network for agent j, \({\psi _{aj}}( \cdot )\) is \(tanh( \cdot )\) which denotes the activation function, \({z_{aj}}\) is input vector combine with \({\delta _j}(k_s^j)\).
The above condition (11) is a static event-triggered condition that contains constant threshold parameters that should be determined by the operation engineer or designer. However, this may cause agents to communicate unnecessarily when consensus is approaching. The event-triggered condition (11) is over-triggered. Taking this into account, we propose a dynamic event-triggered law that introduces an external threshold. The triggering instants sequence determined by the following dynamic event-triggered conditions
where F, \({\sigma _i}\), \({\hat{\omega }_{aj}}\), \({\psi _{aj}}( \cdot )\) and \({z_{aj}}\) are defined in (11), \({\theta _i}\) is a positive constants, \(\eta _k^i\) is external threshold satisfying
with \(\eta _0^i > 0\) and \({\rho _i} > 0\).
Notes and Comments. In (12), \(\eta _k^i\) varies in real time according to the measurement error, local neighbor tracking error and the weight of actor network. Compared with the static event-triggered condition (11), the dynamic one is essentially a static trigger mechanism with “history” information. If \(\eta _k^i\) is set to 0, the dynamic event-triggered condition in (12) becomes a static one, which can be seen as its special case. Note that the external threshold \(\eta _k^i\) will improve the excessive trigger problem caused by static trigger conditions, thus saving communication and computing resources.
3.2 Implementation of the ADP Method
In order to find the optimal discounted value function and control policy, the online-learning policy iteration (PI) algorithm is presented.
PI Algorithm: Let \({k_s} = k = 0\) and \(s = 0\), choose the initial admissible control policy \(u_0^i\) and a small enough positive number \({\iota _i} > 0\).
-
1)
compute the measurement error and tracking error;
-
2)
if the dynamic event-triggered condition (12) is satisfied, compute the iterative local value function by (7).
-
3)
set \(s = s + 1\), \({k_s} = k\). Update the control policy by (10).
-
4)
End until \(\left| {{J_i}\left( {\delta _k^i} \right) - {J_i}\left( {\delta _{k + 1}^i} \right) } \right| < {\iota _i}\), else \(k = k + 1\) and back to step 1;
-
5)
return \(u_{{k_s}}^i\) and \({J_i}\left( {\delta _k^i} \right) \).
Through the continuous iteration of the above algorithm, \({u^*}\) can be get finally. The process of proving the convergence of the above PI algorithm is similar to the work in [13], so it is no longer proved in this paper.
Since the discrete HJB equation is very difficult to resolve, we use the Actor-Critic neural network to obtain an approximate optimal solution of the equation in the PI algorithm. Figure 1 designs the structure diagram of the optimal consensus control network based on event trigger. When the event-triggered condition is satisfied, the current time and state are recorded as sampling time and state respectively, and then sent to the controller. The optimal control policy is approximated by the Actor-Critic network.
(1) Critic NN
The discounted value function is approximated by the following critic network
where \({\hat{\omega }_{ci}}\) is the weight of critic network; \({z_{ci}}\) is an input vector with \(\delta _{{k_s}}^i\), \(u_{{k_s}}^i\) and \(u_{{k_s}}^{{N_i}}\); \({\psi _{ci}}( \cdot )\) is \(tanh( \cdot )\) which denotes the activation function.
Based on the discounted value function (7) and approximation discounted value function (14), we define the error function as
where \({\vartheta _{ci}}(k) = {U_i}(\delta _k^i,{u_{{k_s}}}) + \alpha {\hat{J}_i}(k_s^i + 1) - {\hat{J}_i}(k_s^i)\). The goal of our work is to minimize the error function by adjusting the weights of the neural network. The weight update policy based on gradient is derived as
(2) Actor NN
The control policy is approximated by the following actor network
where \(\hat{\omega }_{ai}\), \({\psi _{ai}}( \cdot )\) and \({z_{ai}}\) are defined in (11). Define the output error function with the approximate control policy (17) as
with \({\vartheta _{ai}}(k) = \hat{u}_k^i - \tilde{u}_k^i\) and \(\tilde{u}_k^i\) is the target of the actor network. The actor network weight update policy can be derived by the gradient descent method as
3.3 Stability Analysis
We will analyze the stability of the MASs which combines the event-triggered mechanism and the neural network structure in two cases whether the events are triggered or not. In order to facilitate the following analysis, we first define two approximation errors. The critic network weight approximation error \({\tilde{\omega }_{ci}}(k_s^i)\) is define as
where \(\omega _{ci}^ * \) is the target weight of the critic network. The actor network weight approximation error \({\tilde{\omega }_{ai}}(k_s^i)\) is define as
where \(\omega _{ai}^ * \) is the target weight of the actor network. Then, when the measurement error of the MASs satisfies the proposed dynamic event-triggered condition (12), we have the following theorem.
Theorem 1
When the dynamic event-triggered condition is satisfied, the MASs updates the control policy of the current agent and the weights of the neural network associated with it. The approximation error \({\tilde{\omega }_{ci}}(k_s^i)\) and \({\tilde{\omega }_{ai}}(k_s^i)\)can be guaranteed UUB.
Proof
Define the following Lyapunov function
where \({L_{i1}}(k_s^i) = \tilde{\omega }_{ci}^T(k_s^i){\tilde{\omega }_{ci}}(k_s^i)\) and \({L_{i2}}(k_s^i) = \tilde{\omega }_{ai}^T(k_s^i){\tilde{\omega }_{ai}}(k_s^i)\). According to (15) and (16), we can get
where \(\varDelta {\psi _{ci}}({z_{ci}}(k)) = {\psi _{ci}}({z_{ci}}(k + 1)) - {\psi _{ci}}({z_{ci}}(k))\), \({\lambda _1}\) is an eigenvalue of \(\alpha {\kappa _{ci}}{\psi _{ci}}({z_{ci}}(k))(1 - \alpha )\varDelta \psi _{ci}^T({z_{ci}}(k))\) and \({\lambda _1} \in \left( { - \infty ,1} \right) \) can be designed by choosing the appropriate \({\kappa _{ci}}\), and \({\varepsilon _1} = \alpha {\kappa _{ci}}{\psi _{ci}}({z_{ci}}(k))\left( (\alpha - 1)\varDelta \psi _{ci}^T({z_{ci}}(k))\omega _{ci}^ * -\right. {}\) \(\left. {U_i}(\delta _k^i,{u_{{k_s}}}) \right) \). \(\varDelta {\psi _{ci}}({z_{ci}}(k))\) and \({\varepsilon _1}\) are bounded since \({\psi _{ci}}({z_{ci}}(k))\) is bounded. Then the difference of \({L_{i1}}(k_s^i)\) is given by
Based on the work in [2], the following condition can be satisfied when \({\lambda _1}\mathrm{{ = }}0\).
Since \(\left\| {{{\tilde{\omega }}_{ci}}(k_s^i)} \right\| > 0\), \({L_{i1}}(k_s^i) < 0\). Then the difference of \({L_{i2}}(k_s^i)\) is given by
Due to \({\kappa _{ai}} \in \left( {0,1} \right) \), \({\psi _{ai}}\left( \cdot \right) \in \left( { - 1,1} \right) \) and \(\left\| {{{\tilde{\omega }}_{a\mathrm{{i}}}}(k_s^i)} \right\| > 0\), the difference \(\varDelta {L_{i2}}(k_s^i) < 0\). Based on (25) and (26), we can know \(\varDelta {L_i}(k_s^i) = \varDelta {L_{i1}}(k_s^i) + \varDelta {L_{i2}}(k_s^i) < 0\). That is to say, \({\tilde{\omega }_{ci}}(k_s^i)\) and \({\tilde{\omega }_{a\mathrm{{i}}}}(k_s^i)\) are UUB and the system is ultimately bounded at the event- triggered instants. The proof is completed.
Assumption 2
There exists a constant \(F > 0\) such that \(\left\| {\delta _{k + 1}^i} \right\| \le F\left\| {\delta _k^i} \right\| + F\left\| {e_k^i} \right\| \).
Under the above assumptions, when the measurement error of the MASs does not meet the proposed dynamic event-triggered condition, we have the following theorem.
Theorem 2
When the measurement error of agent i does not meet the condition (12), the agent will not update its relevant control policy and the weights of neural network. The MASs can be guaranteed to be asymptotically stable if there exist positive scalar \({\sigma _i}\), F, \({\rho _i}\) and \({\theta _i}\) such that
Proof
Consider the following Lyapunov function
Since the dynamic event-triggered condition (12) is dissatisfied, we can know
Based on (13), (29) and Assumption 2, the difference of \(L_i^2(k)\) can be derived as
When \({\sigma _i}\), F, \({\rho _i}\) and \({\theta _i}\) satisfy the condition (27), it can be known that \(\varDelta L_i^2(k) \le 0\) and the system is asymptotically stable. The proof is completed.
4 Simulation
We will verify the effectiveness of our proposed ADP method under the dynamic event-triggered mechanism through the simulation experiments of three trigger modes, namely, time trigger, static event trigger and dynamic event trigger. Consider a MASs (4) with the topological structure of Fig. 2. In practical applications, we can abstract each UAV, robot and so on into an agent node. Given the following initial parameters in the event-triggered mechanism. Pinning gain: \({b_1} = 1\), \({b_2} = {b_3} = {b_4} = {b_5} = {b_6} = {b_7} = 0\). Competition coefficients: \({\varGamma _{21}}{{ = }} - \mathrm{{0}}\mathrm{{.1}}\), \({\varGamma _{25}}\,{{ = }}\,2.1\), \({\varGamma _{31}}\,{{ = }}\,1,{\varGamma _{42}}\,{{ = }}\,1,{\varGamma _{54}}\,{{ = }}\,1\), \({\varGamma _{63}}\,\mathrm{{ = }}\,3\), \({\varGamma _{67}}\,{{ = }} - 1\). Discount factor: \(\alpha = 0.95\), \({\kappa _{ci}} = {\kappa _{ai}} = 0.03\). Event-triggered parameters: \({\theta _i}\,{{ = }}\,0.6,{\rho _i} = 0.59, F = 0.729,{\sigma _i} = 0.72\). In the simulation experiments under the three trigger modes, we set the initial position and initial speed of the agent to be the same, and finally the MASs reached the bipartite consensus.
Figure 3 depicts that under the dynamic event-triggered mechanism we designed, the control policies will remain unchanged for a certain period of time because of the threshold we set. And Figs. 4 and 5 show that even if the event trigger reduces the communication of agents, it still reaches the bipartite consensus. Under the event-triggered modes we have proposed, whenever the measurement error exceeds the event-triggered condition (11) and (12), the agent is triggered, and then the control policy is updated. Therefore, compared with the time-triggered mechanism in which every moment is triggered, the event-triggered mechanism proposed in this paper can significantly save the resource consumption. However, it is clear from Fig. 6 that due to the external threshold added to the dynamic event-triggered condition, the effect is better than that of the static event-triggered mechanism.
5 Conclusion
In this paper, the optimal tracking control problem of discrete second-order MASs under event-triggered mechanism is developed. We propose dynamic event-triggered mechanism to solve the problem of excessive triggering caused by static event-triggered conditions. Then, under this mechanism, the optimal control problem is studied by using ADP method, and the stability of the system is proved.
References
Luo, B., Yang, Y., Liu, D., Wu, H.: Event-triggered optimal control with performance guarantees using adaptive dynamic programming. IEEE Trans. Neural Netw. Learn. Syst. 31(1), 76–88 (2020)
Yang, W., Wei, Q., Liu, D.: Event-triggered adaptive dynamic programming for discrete-time multi-player games. Inf. Sci. 506, 457–470 (2019)
Sahoo, A., Xu, H., Jagannathan, S.: Near optimal event-triggered control of nonlinear discrete-time systems using neurodynamic programming. IEEE Trans. Neural Netw. Learn. Syst. 27(9), 1801–1815 (2016)
Wei, Z., Zhang, H.: Distributed optimal coordination control for nonlinear multi-agent systems using event-triggered adaptive dynamic programming method. ISA Trans. 91, 184–195 (2019)
He, W., Xu, B., Han, Q.-L., Qian, F.: Adaptive Consensus control of linear multiagent systems with dynamic event-triggered strategies. IEEE Trans. Cybern. 50(7), 2996–3008 (2020)
Yu, J., Wang, L.: Group consensus of multi-agent systems with undirected communication graphs. In: 2009 7th Asian Control Conference, pp. 105–110 (2009)
Qin, J., Yu, C.: Group consensus of multiple integrator agents under general topology. In: 52nd IEEE Conference on Decision and Control, pp. 2752–2757 (2013)
Xie, D., Liang, T.: Second-order group consensus for multi-agent systems with time delays. Neurocomputing 153, 133–139 (2015)
Ji, L., Yu, X., Li, C.: Group consensus for heterogeneous multiagent systems in the competition networks with input time delays. IEEE Trans. Syst. Man Cybern. Syst. 50(11), 4655–4663 (2020)
Liu, J., Shi, T., Li, P., Ren, X., Ma, H.: Trajectories planning for multiple UAVs by the cooperative and competitive PSO algorithm. In: 2015 IEEE Intelligent Vehicles Symposium (IV), pp. 107–114 (2015)
Feng, F., Xu, Y., Tang, Z.: Research on the charge rate of railway value-guaranteed transportation based on competitive and cooperative relationships. Adv. Mech. Eng. 10(1), 168781401774769 (2018)
Li, J., Ji, L., Li, H.: Optimal consensus control for unknown second-order multi-agent systems: using model-free reinforcement learning method. Appl. Math. Comput. 410, 126451 (2021). https://doi.org/10.1016/j.amc.2021.126451
Peng, Z., Hu, J., Ghosh, B.K.: Optimal tracking control of heterogeneous multi-agent systems with switching topology via actor-critic neural networks. In: 2018 37th Chinese Control Conference (CCC), pp. 7037–7042 (2018)
Acknowledgement
This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61876200, 62072066 and 62006031, in part by the Foundation of Guangxi Key Laboratory of Cryptography and Information Security(GCIS201908), in part by the Major Scientific and Technological Research Program of Chongqing Municipal Education Commission under grant No. KJZD-M202100602 and in part by the Natural Science Foundation Project of Chongqing Science and Technology Commission under Grant Nos. cstc2018jcyjAX0112 and cstc2019jcyj-msxmX0545.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, X., Ji, L., Yang, S., Wang, Y. (2023). Optimal Group Consensus Control for Multi-agent Systems in Coopetition Networks via Dynamic Event-Triggered Methods. In: Ren, Z., Wang, M., Hua, Y. (eds) Proceedings of 2021 5th Chinese Conference on Swarm Intelligence and Cooperative Control. Lecture Notes in Electrical Engineering, vol 934. Springer, Singapore. https://doi.org/10.1007/978-981-19-3998-3_14
Download citation
DOI: https://doi.org/10.1007/978-981-19-3998-3_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-3997-6
Online ISBN: 978-981-19-3998-3
eBook Packages: EngineeringEngineering (R0)