留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于D3QN的无人机编队控制技术

赵启 甄子洋 龚华军 曹红波 李荣 刘继承

赵启,甄子洋,龚华军,等. 基于D3QN的无人机编队控制技术[J]. 北京航空航天大学学报,2023,49(8):2137-2146 doi: 10.13700/j.bh.1001-5965.2021.0601
引用本文: 赵启,甄子洋,龚华军,等. 基于D3QN的无人机编队控制技术[J]. 北京航空航天大学学报,2023,49(8):2137-2146 doi: 10.13700/j.bh.1001-5965.2021.0601
ZHAO Q,ZHEN Z Y,GONG H J,et al. UAV formation control based on dueling double DQN[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(8):2137-2146 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0601
Citation: ZHAO Q,ZHEN Z Y,GONG H J,et al. UAV formation control based on dueling double DQN[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(8):2137-2146 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0601

基于D3QN的无人机编队控制技术

doi: 10.13700/j.bh.1001-5965.2021.0601
基金项目: 国家自然科学基金(61973158);南京航空航天大学研究生创新基地(实验室)开放基金项目(kfjj20200310,kfjj20200311)
详细信息
    作者简介:

    赵启 男,硕士研究生。主要研究方向:无人机编队控制、强化学习

    甄子洋 男,博士,教授,博士生导师。主要研究方向:舰载机/无人机着舰引导与控制、无人机集群编队协同控制与决策

    龚华军 男,博士,教授,博士生导师。主要研究方向:先进飞行控制技术、飞行综合控制、系统建模与仿真

    通讯作者:

    E-mail:zhenziyang@nuaa.edu.cn

  • 中图分类号: V249.1

UAV formation control based on dueling double DQN

Funds: National Natural Science Foundation of China (61973158); Postgraduate Research & Practice Innovation Program of Nanjing University of Aeronautics and Astronautics (kfjj20200310, kfjj20200311)
More Information
  • 摘要:

    针对无人机编队中控制器设计需要基于模型信息,以及无人机智能化程度低等问题,采用深度强化学习解决编队控制问题。针对编队控制问题设计对应强化学习要素,并设计基于深度强化学习对偶双重深度Q网络(D3QN)算法的编队控制器,同时提出一种优先选择策略与多层动作库结合的方法,加快算法收敛速度并使僚机最终能够保持到期望距离。通过仿真将设计的控制器与PID控制器、Backstepping控制器对比,验证D3QN控制器的有效性。仿真结果表明:该控制器可应用于无人机编队,提高僚机智能化程度,自主学习保持到期望距离,且控制器设计无需模型精确信息,为无人机编队智能化控制提供了依据与参考。

     

  • 图 1  僚机跟随目标示意图

    Figure 1.  Learning target of follower

    图 2  D3QN编队控制结构

    Figure 2.  Control structure of D3QN formation control

    图 3  D3QN神经网络结构

    Figure 3.  Structure of D3QN neural network

    图 4  动作选择策略对比

    Figure 4.  Action selection strategy comparison

    图 5  D3QN与DQN平均奖励对比

    Figure 5.  D3QN and DQN average rewards comparison

    图 6  僚机状态变化曲线(长机平飞)

    Figure 6.  Change of states of follower (lead aircraft horizontal flight)

    图 7  长机飞行指令

    Figure 7.  Flight command of leader

    图 8  僚机状态变化曲线(长机变换)

    Figure 8.  Change of states of follower (lead aircraft change flight)

    表  1  参数设置

    Table  1.   Parameter setting

    参数数值
    ${\rm{ ep} }{{\rm{i}}_{\max } }$10000
    ${\rm{Maxstep}}$200
    ${\psi _{{\rm{max}}} }$180
    ${\rm{batchsize}}$128
    $\varepsilon $1→0.1
    $\alpha $0.0001
    $n$1
    ${ {\mu _1},{\mu _2} }$50
    ${\tau _{\psi a}}$0.919
    ${T_{\rm{s}}}$0.5
    ${T_{\rm{d}}}$200
    $M$1×106
    $\gamma $0.95
    $m$8
    $\tau $0.9
    ${ {\sigma _1},{\sigma _2},{\sigma _3} }$1
    ${ {\mu _3} }$20
    ${\tau _{\psi b}}$0.919
    下载: 导出CSV
  • [1] 宗群, 王丹丹, 邵士凯, 等. 多无人机协同编队飞行控制研究现状及发展[J]. 哈尔滨工业大学学报, 2017, 49(3): 1-14. doi: 10.11918/j.issn.0367-6234.2017.03.001

    ZONG Q, WANG D D, SHAO S K, et al. Research status and development of multi UAV coordinated formation flight control[J]. Journal of Harbin Institute of Technology, 2017, 49(3): 1-14(in Chinese). doi: 10.11918/j.issn.0367-6234.2017.03.001
    [2] DESAI J P, OSTROWSKI J, KUMAR V. Controlling formations of multiple mobile robots[C]//1998 IEEE International Conference on Robotics and Automation. Piscataway: IEEE Press, 2002: 2864-2869.
    [3] LEWIS M A, TAN K H. High precision formation control of mobile robots using virtual structures[J]. Autonomous Robots, 1997, 4(4): 387-403. doi: 10.1023/A:1008814708459
    [4] QIU H, DUAN H, FAN Y. Multiple unmanned aerial vehicle autonomous formation based on the behavior mechanism in pigeon flocks[J]. Control Theory & Applications, 2015, 32(10): 1298-1304.
    [5] LAMAN G. On graphs and rigidity of plane skeletal structures[J]. Journal of Engineering Mathematics, 1970, 4(4): 331-340. doi: 10.1007/BF01534980
    [6] OLFATI-SABER R, MURRAY R M. Consensus problems in networks of agents with switching topology and time-delays[J]. IEEE Transactions on Automatic Control, 2004, 49(9): 1520-1533. doi: 10.1109/TAC.2004.834113
    [7] 邓婉, 王新民, 王晓燕, 等. 无人机编队队形保持变换控制器设计[J]. 计算机仿真, 2011, 28(10): 73-77. doi: 10.3969/j.issn.1006-9348.2011.10.018

    DENG W, WANG X M, WANG X Y, et al. Controller design of UAVs formation keep and change[J]. Computer Simulation, 2011, 28(10): 73-77(in Chinese). doi: 10.3969/j.issn.1006-9348.2011.10.018
    [8] 于美妍, 杨洪勇, 孙玉娇. 基于Backstepping的三轮机器人编队控制[J]. 复杂系统与复杂性科学, 2021, 18(3): 28-34. doi: 10.13306/j.1672-3813.2021.03.005

    YU M Y, YANG H Y, SUN Y J. Formation control of three wheeled robots based on Backstepping[J]. Complex Systems and Complexity Science, 2021, 18(3): 28-34(in Chinese). doi: 10.13306/j.1672-3813.2021.03.005
    [9] 周彬, 郭艳, 李宁, 等. 基于导向强化Q学习的无人机路径规划[J]. 航空学报, 2021, 42(9): 325109. doi: 10.7527/S1000-6893.2021.25109

    ZHOU B, GUO Y, LI N, et al. Path planning of UAV using guided enhancement Q-learning algorithm[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(9): 325109(in Chinese). doi: 10.7527/S1000-6893.2021.25109
    [10] 李波, 越凯强, 甘志刚, 等. 基于MADDPG的多无人机协同任务决策[J]. 宇航学报, 2021, 42(6): 757-765. doi: 10.3873/j.issn.1000-1328.2021.06.009

    LI B, YUE K Q, GAN Z G, et al. Multi-UAV cooperative autonomous navigation based on multi-agent deep deterministic policy gradient[J]. Journal of Astronautics, 2021, 42(6): 757-765(in Chinese). doi: 10.3873/j.issn.1000-1328.2021.06.009
    [11] HUANG X, LUO W Y, LIU J R. Attitude control of fixed-wing UAV based on DDQN[C]//2019 Chinese Automation Congress (CAC). Piscataway: IEEE Press, 2020: 4722-4726.
    [12] HUNG S M, GIVIGI S N, NOURELDIN A. A Dyna-Q(Lambda) approach to flocking with fixed-wing UAVs in a stochastic environment[C]//2015 IEEE International Conference on Systems, Man, and Cybernetics. Piscataway: IEEE Press, 2016: 1918-1923.
    [13] WANG C, WANG J, SHEN Y, et al. Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach[J]. IEEE Transactions on Vehicular Technology, 2019, 68(3): 2124-2136. doi: 10.1109/TVT.2018.2890773
    [14] WANG C, YAN C, XIANG X, et al. A continuous actor-critic reinforcement learning approach to flocking with fixed-wing UAVs[C]//The Eleventh Asian Conference on Machine Learning. New York: PMLR, 2019, 101: 1-16.
    [15] HUNG S M, GIVIGI S N. A Q-learning approach to flocking with UAVs in a stochastic environment[J]. IEEE Transactions on Cybernetics, 2016, 47(1): 186-197.
    [16] IIMA H, KUROE Y. Swarm reinforcement learning methods improving certainty of learning for a multi-robot-formation problem[C]//2015 IEEE Congress on Evolutionary Computation (CEC). Piscataway: IEEE Press, 2015: 3026-3033.
    [17] 相晓嘉, 闫超, 王菖, 等. 基于深度强化学习的固定翼无人机编队协调控制方法[J]. 航空学报, 2021, 42(4): 524009.

    XIANG X J, YAN C, WANG C, et al. Coordination control method for fixed-wing UAV formation through deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4): 524009(in Chinese).
    [18] PACHTER M, AZZO J J D, DARGAN J L. Automatic formation flight control[J]. Journal of Guidance, Control, and Dynamics, 1994, 17(6): 1380-1383.
    [19] 王晓燕, 王新民, 肖亚辉, 等. 无人机三维编队飞行的鲁棒H控制器设计[J]. 控制与决策, 2012, 27(12): 1907-1911.

    WANG X Y, WANG X M, XIAO Y H, et al. Design of robust H controller for UAVs three-dimensional formation flight[J]. Control and Decision, 2012, 27(12): 1907-1911(in Chinese).
    [20] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature (London), 2015, 518(7540): 529-533. doi: 10.1038/nature14236
    [21] VAN HASSELT H. Double Q-learning[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2010: 2613-2621.
    [22] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2016: 2094-2100.
    [23] WANG Z Y, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning. New York: ACM, 2016: 1995-2003.
    [24] HU H A, WANG Q L. Proximal policy optimization with an integral compensator for quadrotor control[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(5): 777-795.
  • 加载中
图(8) / 表(1)
计量
  • 文章访问数:  518
  • HTML全文浏览量:  47
  • PDF下载量:  67
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-10-10
  • 录用日期:  2021-12-09
  • 网络出版日期:  2021-12-30
  • 整期出版日期:  2023-08-31

目录

    /

    返回文章
    返回
    常见问答