留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

无人集群系统行为决策学习奖励机制

张婷婷 蓝羽石 宋爱国

张婷婷, 蓝羽石, 宋爱国等 . 无人集群系统行为决策学习奖励机制[J]. 北京航空航天大学学报, 2021, 47(12): 2442-2451. doi: 10.13700/j.bh.1001-5965.2020.0600
引用本文: 张婷婷, 蓝羽石, 宋爱国等 . 无人集群系统行为决策学习奖励机制[J]. 北京航空航天大学学报, 2021, 47(12): 2442-2451. doi: 10.13700/j.bh.1001-5965.2020.0600
ZHANG Tingting, LAN Yushi, SONG Aiguoet al. Behavioral decision learning reward mechanism of unmanned swarm system[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(12): 2442-2451. doi: 10.13700/j.bh.1001-5965.2020.0600(in Chinese)
Citation: ZHANG Tingting, LAN Yushi, SONG Aiguoet al. Behavioral decision learning reward mechanism of unmanned swarm system[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(12): 2442-2451. doi: 10.13700/j.bh.1001-5965.2020.0600(in Chinese)

无人集群系统行为决策学习奖励机制

doi: 10.13700/j.bh.1001-5965.2020.0600
基金项目: 

国家自然科学基金 61802428

中国博士后科学基金 2019M651991

军委科技委国防科技基金 2019-JCJQJJ-014

详细信息
    通讯作者:

    张婷婷. E-mail: 101101964@seu.edu.cn

  • 中图分类号: TP181

Behavioral decision learning reward mechanism of unmanned swarm system

Funds: 

National Natural Science Foundation of China 61802428

China Postdoctoral Science Foundation 2019M651991

National Defense Science and Technology Project Fund of Science and Technology Commission of the Military Commission 2019-JCJQJJ-014

More Information
  • 摘要:

    未来作战的发展方向是由多智能体系统构成的无人集群系统通过智能体之间自主协同来完成作战任务。由于每个智能体自主采取行为和改变状态,增加了智能群体行为策略训练的不稳定性。通过先验约束条件和智能体间的同构特性增强奖励信号的实时性,提高训练效率和学习的稳定性。采用动作空间边界碰撞惩罚、智能体间时空距离约束满足程度奖励;通过智能体在群体中的关系特性,增加智能体之间经验共享,进一步优化学习效率。在实验中,将先验增强的奖励机制和经验共享应用到多智能体深度确定性策略梯度(MADDPG)算法中验证其有效性。结果表明,学习收敛性和稳定性有大幅提高,从而提升了无人集群系统行为学习效率。

     

  • 图 1  捕食-逃逸几何模型

    Figure 1.  Geometric predation-escape model

    图 2  MADDPG算法训练执行视图

    Figure 2.  Training execution of MADDPG algorithm

    图 3  MADDPG算法训练框架

    Figure 3.  Training framework of MADDPG algorithm

    图 4  捕食者1奖励函数曲线

    Figure 4.  Curves of Predator 1 reward function

    图 5  捕食者2奖励函数曲线

    Figure 5.  Curves of Predator 2 reward function

    图 6  捕食者3奖励函数曲线

    Figure 6.  Curves of Predator 3 reward function

    图 7  逃逸者奖励函数曲线

    Figure 7.  Curves of escaper reward function

    图 8  奖励函数曲线总和

    Figure 8.  Reward function curve sum

    图 9  MADDPG、PD-MADDPG、PES-MADDPG算法奖励函数收敛性对比

    Figure 9.  Reward function convergence comparison among MADDPG, PD-MADDPG and PES-MADDPG algorithms

    图 10  对抗任务下双方航迹

    Figure 10.  Track map of both parties under confrontation mission

    图 11  捕食者无人机航迹

    Figure 11.  Predator UAV track map

    图 12  逃逸者无人机航迹

    Figure 12.  Escaper UAV track map

    图 13  智能体3V1围捕结果

    Figure 13.  Result of agent 3V1 roundup

    图 14  智能体20V6围捕结果

    Figure 14.  Result of agent 20V6 roundup

    表  1  奖励机制设置

    Table  1.   Reward mechanism setting

    执行者 碰撞 不碰撞
    捕食者 +10 -1
    逃逸者 -10 +1
    下载: 导出CSV

    表  2  3V1对抗实验场景

    Table  2.   Experimental scenario of 3 versus 1 confrontation

    场景名称 是否对抗 智能体数量 获胜条件
    Simple_tag 3红
    VS
    1蓝
    蓝:蓝色Agent尽可能避免与3个红色Agent的碰撞
    红:3个红色Agent之间尽可能协同与蓝色Agent发生碰撞
    下载: 导出CSV

    表  3  平均每步碰撞次数

    Table  3.   Average number of collisions per step

    算法 奖励机制是否改进 平均碰撞次数
    MADDPG
    0.538
    0.521
    DDPG
    0.532
    0.523
    下载: 导出CSV
  • [1] 张婷婷, 宋爱国, 蓝羽石. 集群无人系统自适应结构建模与预测[J]. 中国科学: 信息科学, 2020, 50(1): 347-362. https://www.cnki.com.cn/Article/CJFDTOTAL-PZKX202003005.htm

    ZHANG T T, SONG A G, LAN Y S. Adaptive structure modeling and prediction of cluster unmanned system[J]. Chinese Science: Information Science, 2020, 50(1): 347-362(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-PZKX202003005.htm
    [2] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7): 1301-1309. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202007001.htm

    SUN C Y, MU C X. Important scientific problems of multi-agent deep reinforcement learning [J]. Journal of Automatica Sinica, 2020, 46(7): 1301-1309(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202007001.htm
    [3] 陈杰. 多智能体系统中的几个问题[J]. 中国科学人, 2019, 12(1): 40-43. https://www.cnki.com.cn/Article/CJFDTOTAL-KXZG201912022.htm

    CHEN J. Several problems in multi-agent system [J]. Scientific Chinese, 2019, 12(1): 40-43(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-KXZG201912022.htm
    [4] LOWE R, WU Y I, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. (2020-03-14)[2020-03-22]. http://arxiv.org/abs/1706.02275.
    [5] 许诺, 杨振伟. 稀疏奖励下基于MADDPG算法的多智能体协同[J]. 现代计算机, 2020(15): 47-51. https://www.cnki.com.cn/Article/CJFDTOTAL-XDJS202015009.htm

    XU N, YANG Z W. Multi-agent collaboration based on MADDPG algorithm under sparse reward[J]. Modern Computer, 2020(15): 47-51(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-XDJS202015009.htm
    [6] 杨慧慧, 黄万荣, 敖富江. 基于强化学习的鱼群自组织行为模拟[J]. 国防科技大学学报, 2020, 42(1): 194-202. https://www.cnki.com.cn/Article/CJFDTOTAL-GFKJ202001027.htm

    YANG H H, HUANG W R, AO F J. Simulation on self-organization behaviors of fish school based on reinforcement learning[J]. Journal of National University of Defense Technology, 2020, 42(1): 194-202(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-GFKJ202001027.htm
    [7] 王毅然, 经小川, 贾福凯, 等. 基于多智能体协同强化学习的多目标追踪方法[J]. 计算机工程, 2020, 46(11): 90-96. doi: 10.3778/j.issn.1002-8331.1911-0132

    WANG Y R, JING X C, JIA F K, et al. Multi-target tracking method based on multi-agent collaborative reinforcement learning[J]. Computer Engineering, 2020, 46(11): 90-96(in Chinese). doi: 10.3778/j.issn.1002-8331.1911-0132
    [8] 邹长杰, 郑皎凌, 张中雷. 基于GAED-MADDPG多智能体强化学习的协作策略研究[J]. 计算机应用研究, 2020, 37(12): 3656-3661. https://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ202012027.htm

    ZOU C J, ZHENG J L, ZHANG Z L. Research on collaborative strategy based on GAED-MADDPG multi-agent reinforcement learning[J]. Application Research of Computers, 2020, 37(12): 3656-3661(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ202012027.htm
    [9] 高昂, 董志明, 李亮, 等. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2): 420-433. https://www.cnki.com.cn/Article/CJFDTOTAL-XTYD202102018.htm

    GAO A, DONG Z M, LI L, et al. Parallel priority experience replay mechanism algorithm of MADDPG[J]. Systems Engineering and Electronics, 2021, 43(2): 420-433(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-XTYD202102018.htm
    [10] WEIREN K, DEYUN Z, ZHEN Y. Air combat strategies generation of CGF based on MADDPG and reward shaping[C]//2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL). Piscataway: IEEE Press, 2020: 651-655.
    [11] SUN Y, LAI J, CAO L, et al. A novel multi-agent parallel-critic network architecture for cooperative-competitive reinforcement learning[J]. IEEE Access, 2020, 8: 135605-135616. doi: 10.1109/ACCESS.2020.3011670
    [12] ZHU P, DAI W, YAO W, et al. Multi-robot flocking control based on deep reinforcement learning[J]. IEEE Access, 2020, 8: 150397-150406. doi: 10.1109/ACCESS.2020.3016951
    [13] VAN OTTERLO M, WIREING M. Reinforcement learning and Markov decision processes[M]//WIREING M, VAN OTTERLO M. Reinforcement learning. Berlin: Springer, 2012: 3-42.
    [14] 陈亮, 梁宸, 张景异, 等. Actor-Critic框架下一种基于改进DDPG的多智能体强化学习算法[J]. 控制与决策, 2021, 36(1): 75-82. https://www.cnki.com.cn/Article/CJFDTOTAL-KZYC202101008.htm

    CHEN L, LIANG C, ZHANG J Y, et al. A multi-agent reinforcement learning algorithm based on improved DDPG under actor critical framework[J]. Control and Decision, 2021, 36(1): 75-82(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-KZYC202101008.htm
    [15] 孙彧, 曹雷, 陈希亮, 等. 多智能体深度强化学习研究综述[J]. 计算机工程与应用, 2020, 56(5): 13-24. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202005003.htm

    SUN Y, CAO L, CHEN X L, et al. A review of multi-agent deep reinforcement learning research[J]. Computer Engineering and Application, 2020, 56(5): 13-24(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202005003.htm
  • 加载中
图(14) / 表(3)
计量
  • 文章访问数:  340
  • HTML全文浏览量:  90
  • PDF下载量:  221
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-10-23
  • 录用日期:  2021-04-23
  • 网络出版日期:  2021-12-20

目录

    /

    返回文章
    返回
    常见问答