Citation: | ZHANG Tingting, LAN Yushi, SONG Aiguoet al. Behavioral decision learning reward mechanism of unmanned swarm system[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(12): 2442-2451. doi: 10.13700/j.bh.1001-5965.2020.0600(in Chinese) |
Unmanned swarm system is composed of a multi-agent system, which can meet task requirements through autonomous and cooperative behavior. The instability of agent training is increased because agents adopt behavior and change states autonomously. In this paper, the prior constraints and the isomorphism between agents are used to enhance the real-time performance of reward signals and improve the efficiency of training and the stability of learning. Specifically, it includes the punishment of action space boundary collision and the reward for the satisfaction degree of the space-time distance constraint between agents. At the same time, through the relationship characteristics of agents in the group, experience sharing among agents is increased to further optimize the learning efficiency. In the experiment, the prior enhanced reward mechanism and experience sharing are applied to the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm to verify its effectiveness. It is observed that the learning convergence and stability are greatly improved, and thus the behavior learning efficiency of unmanned swarm system is enhanced.
[1] |
张婷婷, 宋爱国, 蓝羽石. 集群无人系统自适应结构建模与预测[J]. 中国科学: 信息科学, 2020, 50(1): 347-362. https://www.cnki.com.cn/Article/CJFDTOTAL-PZKX202003005.htm
ZHANG T T, SONG A G, LAN Y S. Adaptive structure modeling and prediction of cluster unmanned system[J]. Chinese Science: Information Science, 2020, 50(1): 347-362(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-PZKX202003005.htm
|
[2] |
孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7): 1301-1309. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202007001.htm
SUN C Y, MU C X. Important scientific problems of multi-agent deep reinforcement learning [J]. Journal of Automatica Sinica, 2020, 46(7): 1301-1309(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202007001.htm
|
[3] |
陈杰. 多智能体系统中的几个问题[J]. 中国科学人, 2019, 12(1): 40-43. https://www.cnki.com.cn/Article/CJFDTOTAL-KXZG201912022.htm
CHEN J. Several problems in multi-agent system [J]. Scientific Chinese, 2019, 12(1): 40-43(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-KXZG201912022.htm
|
[4] |
LOWE R, WU Y I, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. (2020-03-14)[2020-03-22]. http://arxiv.org/abs/1706.02275.
|
[5] |
许诺, 杨振伟. 稀疏奖励下基于MADDPG算法的多智能体协同[J]. 现代计算机, 2020(15): 47-51. https://www.cnki.com.cn/Article/CJFDTOTAL-XDJS202015009.htm
XU N, YANG Z W. Multi-agent collaboration based on MADDPG algorithm under sparse reward[J]. Modern Computer, 2020(15): 47-51(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-XDJS202015009.htm
|
[6] |
杨慧慧, 黄万荣, 敖富江. 基于强化学习的鱼群自组织行为模拟[J]. 国防科技大学学报, 2020, 42(1): 194-202. https://www.cnki.com.cn/Article/CJFDTOTAL-GFKJ202001027.htm
YANG H H, HUANG W R, AO F J. Simulation on self-organization behaviors of fish school based on reinforcement learning[J]. Journal of National University of Defense Technology, 2020, 42(1): 194-202(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-GFKJ202001027.htm
|
[7] |
王毅然, 经小川, 贾福凯, 等. 基于多智能体协同强化学习的多目标追踪方法[J]. 计算机工程, 2020, 46(11): 90-96. doi: 10.3778/j.issn.1002-8331.1911-0132
WANG Y R, JING X C, JIA F K, et al. Multi-target tracking method based on multi-agent collaborative reinforcement learning[J]. Computer Engineering, 2020, 46(11): 90-96(in Chinese). doi: 10.3778/j.issn.1002-8331.1911-0132
|
[8] |
邹长杰, 郑皎凌, 张中雷. 基于GAED-MADDPG多智能体强化学习的协作策略研究[J]. 计算机应用研究, 2020, 37(12): 3656-3661. https://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ202012027.htm
ZOU C J, ZHENG J L, ZHANG Z L. Research on collaborative strategy based on GAED-MADDPG multi-agent reinforcement learning[J]. Application Research of Computers, 2020, 37(12): 3656-3661(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ202012027.htm
|
[9] |
高昂, 董志明, 李亮, 等. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2): 420-433. https://www.cnki.com.cn/Article/CJFDTOTAL-XTYD202102018.htm
GAO A, DONG Z M, LI L, et al. Parallel priority experience replay mechanism algorithm of MADDPG[J]. Systems Engineering and Electronics, 2021, 43(2): 420-433(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-XTYD202102018.htm
|
[10] |
WEIREN K, DEYUN Z, ZHEN Y. Air combat strategies generation of CGF based on MADDPG and reward shaping[C]//2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL). Piscataway: IEEE Press, 2020: 651-655.
|
[11] |
SUN Y, LAI J, CAO L, et al. A novel multi-agent parallel-critic network architecture for cooperative-competitive reinforcement learning[J]. IEEE Access, 2020, 8: 135605-135616. doi: 10.1109/ACCESS.2020.3011670
|
[12] |
ZHU P, DAI W, YAO W, et al. Multi-robot flocking control based on deep reinforcement learning[J]. IEEE Access, 2020, 8: 150397-150406. doi: 10.1109/ACCESS.2020.3016951
|
[13] |
VAN OTTERLO M, WIREING M. Reinforcement learning and Markov decision processes[M]//WIREING M, VAN OTTERLO M. Reinforcement learning. Berlin: Springer, 2012: 3-42.
|
[14] |
陈亮, 梁宸, 张景异, 等. Actor-Critic框架下一种基于改进DDPG的多智能体强化学习算法[J]. 控制与决策, 2021, 36(1): 75-82. https://www.cnki.com.cn/Article/CJFDTOTAL-KZYC202101008.htm
CHEN L, LIANG C, ZHANG J Y, et al. A multi-agent reinforcement learning algorithm based on improved DDPG under actor critical framework[J]. Control and Decision, 2021, 36(1): 75-82(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-KZYC202101008.htm
|
[15] |
孙彧, 曹雷, 陈希亮, 等. 多智能体深度强化学习研究综述[J]. 计算机工程与应用, 2020, 56(5): 13-24. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202005003.htm
SUN Y, CAO L, CHEN X L, et al. A review of multi-agent deep reinforcement learning research[J]. Computer Engineering and Application, 2020, 56(5): 13-24(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202005003.htm
|