Abstract
During short-range air combat involving unmanned aircraft vehicle (UAV) swarms, UAVs must make accurate maneuver decisions based on information from both enemy and friendly UAVs. This dual requirement of competition and cooperation presents a significant challenge in the field of unmanned air combat. In this paper, a method based on multi-agent reinforcement learning (MARL) is proposed to address this issue. An actor network containing three subnetworks that can handle different types of situational information is designed. Hence, the results from simpler one-on-one scenarios are leveraged to enhance the complex swarm air combat training process. Separate state spaces for local and global information are designed for the actor and critic networks. A detailed reward function is proposed to encourage participation. To prevent lazy participants in air combat, a reward assignment operation is applied to distribute these dense rewards. Simulation testing and ablation experiments demonstrate that both the transfer operation and reward assignment operation can effectively deal with the swarm air combat scenario, and reflect the effectiveness of the proposed method.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Fan S, Liu H H T. Multi-UAV cooperative hunting in cluttered environments considering downwash effects. Guid Navigat Control, 2023, 03: 2350004
Kong L, Reis J, He W, et al. On dynamic performance control for a quadrotor-slung-load system with unknown load mass. Automatica, 2024, 162: 111516
Li S, Shao X, Zhang W, et al. Distributed multicircular circumnavigation control for UAVs with desired angular spacing. Defence Tech, 2024, 31: 429–446
Kong L, Reis J, He W, et al. Experimental validation of a robust prescribed performance nonlinear controller for an unmanned aerial vehicle with unknown mass. IEEE ASME Trans Mechatron, 2024, 29: 301–312
Jiang F, Xu M, Li Y, et al. Short-range air combat maneuver decision of UAV swarm based on multi-agent transformer introducing virtual objects. Eng Appl Artif Intell, 2023, 123: 106358
Dong Y Q, Ai J L, Liu J Q. Guidance and control for own aircraft in the autonomous air combat: a historical review and future prospects. Proc Inst Mech Eng Part G J Aero Eng, 2019, 233: 5943–5991
Sun Z, Piao H, Yang Z, et al. Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play. Eng Appl Artif Intelligence, 2021, 98: 104112
Kong W R, Zhou D Y, Zhang K, et al. Air combat autonomous maneuver decision for one-on-one within visual range engagement base on robust multi-agent reinforcement learning. In: Proceedings of the 16th International Conference on Control & Automation (ICCA), Singapore, 2020. 506–512
Yang Q, Zhang J, Shi G, et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning. IEEE Access, 2019, 8: 363–378
Wang L, Wang J, Liu H, et al. Decision-making strategies for close-range air combat based on reinforcement learning with variable-scale actions. Aerospace, 2023, 10: 401
Li S, Wang Y, Zhou Y, et al. Multi-UAV cooperative air combat decision-making based on multi-agent double-soft actor-critic. Aerospace, 2023, 10: 574
Duan H, Li P, Yu Y. A predator-prey particle swarm optimization approach to multiple UCAV air combat modeled by dynamic game theory. IEEE CAA J Autom Sin, 2015, 2: 11–18
Huang C, Dong K, Huang H, et al. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J Syst Eng Electron, 2018, 29: 86–97
Liu L, Zheng Y, Lu X, et al. Research on individual performance index of air cluster combat aircraft based on differential game theory. J Phys-Conf Ser, 2023, 2478: 102013
Liu Y P, Gao X, Shi J X, et al. Research on decision-making method of air combat embedded training based on extended influence diagram. In: Proceedings of Advances in Guidance, Navigation and Control. Lecture Notes in Electrical Engineering, Singapore, 2021
Jiandong Z, Qiming Y, Guoqing S, et al. UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning. J Syst Eng Electron, 2021, 32: 1421–1438
Li Y, Shi J, Jiang W, et al. Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm. Defence Tech, 2022, 18: 1697–1714
Zhu J, Kuang M, Zhou W, et al. Mastering air combat game with deep reinforcement learning. Defence Tech, 2024, 34: 295–312
Yuan X, Wang H, Yu W. A weighted mean field reinforcement learning algorithm for large-scale multi-agent collaboration. Guid Navigat Control, 2023, 03: 2350007
Li J N, Nie H, Chai T, et al. Reinforcement learning for optimal tracking of large-scale systems with multitime scales. Sci China Inf Sci, 2023, 66: 170201
Wang H, Wang J. Enhancing multi-UAV air combat decision making via hierarchical reinforcement learning. Sci Rep, 2024, 14: 4458
Luo D, Fan Z, Yang Z, et al. Multi-UAV cooperative maneuver decision-making for pursuit-evasion using improved MADRL. Defence Tech, 2024, 35: 187–197
Wang Z, Guo Y, Li N, et al. Autonomous collaborative combat strategy of unmanned system group in continuous dynamic environment based on PD-MADDPG. Comput Commun, 2023, 200: 182–204
Hu D, Yang R, Zhang Y, et al. Aerial combat maneuvering policy learning based on confrontation demonstrations and dynamic quality replay. Eng Appl Artif Intell, 2022, 111: 104767
Austin F, Carbone G, Falco M, et al. Automated maneuvering decisions for air-to-air combat. In: Proceedings of Guidance, Navigation and Control Conference, Monterey, 1987
Yang A W, Li Z W, Li B, et al. Air combat situation assessment based on dynamic variable weight. Acta Armamentarii, 2021, 42: 1553–1563
Zhan G, Zhang X, Li Z, et al. Multiple-UAV reinforcement learning algorithm based on improved PPO in ray framework. Drones, 2022, 6: 166
Yu C, Velu A, Vinitsky E, et al. The surprising effectiveness of PPO in cooperative, multi-agent games. 2021. ArXiv:2103.01955
Schulman J, Wolski F, Dhariwal P, at al. Proximal policy optimization algorithms. 2017. ArXiv:1707.06347
Zhu J W, Zhang H, Zhao S B, et al. Multi-constrained intelligent gliding guidance via optimal control and DQN. Sci China Inf Sci, 2023, 66: 132202
Li L T, Zhou Z M, Chai J J, et al. Learning continuous 3-DoF air-to-air close-in combat strategy using proximal policy optimization. In: Proceedings of IEEE Conference on Games (CoG), Beijing, 2022. 616–619
Acknowledgements
This work was supported by National Key R&D Program of China (Grant No. 2023YFC3011001) and National Natural Science Foundation of China (Grant Nos. U20B2071, 62350048, T2121003).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zheng, Z., Wei, C. & Duan, H. UAV swarm air combat maneuver decision-making method based on multi-agent reinforcement learning and transferring. Sci. China Inf. Sci. 67, 180204 (2024). https://doi.org/10.1007/s11432-023-4088-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-023-4088-2