留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

无人机自主引导跟踪与避障的近端策略优化

胡多修 董文瀚 解武杰

胡多修,董文瀚,解武杰. 无人机自主引导跟踪与避障的近端策略优化[J]. 北京航空航天大学学报,2023,49(1):195-205 doi: 10.13700/j.bh.1001-5965.2021.0182
引用本文: 胡多修,董文瀚,解武杰. 无人机自主引导跟踪与避障的近端策略优化[J]. 北京航空航天大学学报,2023,49(1):195-205 doi: 10.13700/j.bh.1001-5965.2021.0182
HU D X,DONG W H,XIE W J. Proximal policy optimization for UAV autonomous guidance, tracking and obstacle avoidance[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(1):195-205 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0182
Citation: HU D X,DONG W H,XIE W J. Proximal policy optimization for UAV autonomous guidance, tracking and obstacle avoidance[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(1):195-205 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0182

无人机自主引导跟踪与避障的近端策略优化

doi: 10.13700/j.bh.1001-5965.2021.0182
详细信息
    作者简介:

    胡多修等:无人机自主引导跟踪与避障的近端策略优化 11

    通讯作者:

    E-mail:dongwenhan@sina.com

  • 中图分类号: TP181

Proximal policy optimization for UAV autonomous guidance, tracking and obstacle avoidance

More Information
  • 摘要:

    针对无人机地面动态目标跟踪问题,建立了远距离自主引导与近距离伴飞避障2个阶段的马尔可夫决策过程模型。在此基础上,提出了一种改进的近端策略优化(PPO)算法。考虑到无人机接收到的数据具有时序性且环境状态存在上下文关联,所提算法采用长短期记忆(LSTM)网络,通过无人机与目标的实时位置关系等状态信息来计算奖励值,更新网络参数,并进行自适应优化迭代。通过基于ROS系统的仿真测试平台进行试验,结果表明:所提算法安全有效地实现了侦察任务全过程的自主机动,与传统的PPO算法相比,LSTM的引入缩短了模型训练时间,跟踪与避障的效率明显提高,进一步加强了算法的鲁棒性、准确性和实时性。

     

  • 图 1  无人机与目标的空间位置关系

    Figure 1.  Relative position of UAV and its target

    图 2  障碍物、目标与无人机的空间位置关系

    Figure 2.  Relative position of obstacle, target and UAV

    图 3  马尔可夫决策过程模型的描述

    Figure 3.  Description of Markov decision process model

    图 4  自主引导与伴飞避障流程

    Figure 4.  Flowchart for autonomous guidance and obstacle avoidance of accompanying flight

    图 5  PPO算法框架

    Figure 5.  Structure of PPO algorithm

    图 6  LSTM结构示意图

    Figure 6.  Schematic of LSTM structure

    图 7  单步平均奖励变化曲线

    Figure 7.  Variation curves of single step average reward

    图 8  不同场景下基于传统PPO算法的运动轨迹

    Figure 8.  Motion paths in different scenarios based on traditional PPO algorithm

    图 9  不同场景下基于改进PPO算法的运动轨迹

    Figure 9.  Motion paths in different scenarios based on improved PPO algorithm

    表  1  仿真参数设置

    Table  1.   Simulation parameter setting

    参数场景1场景2场景3
    任务阶段自主引导过程伴飞避障过程全过程
    采用的模型自主引导模型伴飞避障模型自主引导与伴飞避障模型
    时间周期N/s${\text{0} }{\text{.033\;3} }$${\text{0} }{\text{.033\;3} }$${\text{0} }{\text{.033\;3} }$
    $\gamma $${\text{0}}{\text{.99}}$${\text{0}}{\text{.99}}$${\text{0}}{\text{.99}}$
    $\varepsilon $${\text{0}}{\text{.2}}$${\text{0}}{\text{.2}}$${\text{0}}{\text{.2}}$
    初始条件/m$\begin{gathered} {H}_{\text{0} }^{ {\text{UAV} } }{\text{ = 30} } \\ 70 \leqslant \left| { {\boldsymbol{P} }_0^{ {\text{UAV} } } - {\boldsymbol{P} }_0^{ {\text{TAG} } } } \right| \leqslant 100 \\ \end{gathered}$$\begin{gathered} { H}_{\text{0} }^{ {\text{UAV} } }{\text{ = 10} } \\ 50 \leqslant \left| { {\boldsymbol{P} }_0^{ {\text{UAV} } } - {\boldsymbol{P} }_0^{ {\text{TAG} } } } \right| \leqslant 80 \\ \end{gathered}$$\begin{gathered} { H}_{\text{0} }^{ {\text{UAV} } }{\text{ = 30} } \\ 70 \leqslant \left| { {\boldsymbol{P} }_0^{ {\text{UAV} } } - {\boldsymbol{P} }_0^{ {\text{TAG} } } } \right| \leqslant 100 \\ \end{gathered}$
    终止条件/m$\begin{gathered} t \geqslant 35\;{\rm{s} } \\ 或\left| { {\boldsymbol{P} }_n^{ {\text{UAV} } } - {\boldsymbol{P} }_n^{ {\text{TAG} } } } \right| \leqslant 2 \\ \end{gathered}$$\begin{gathered} t \geqslant 35\;{\rm{s}} \\ 或\left| { {\boldsymbol{P} }_n^{ {\text{UAV} } } - {\boldsymbol{P} }_n^{ {\text{TAG} } } } \right| \leqslant 10 \\ \end{gathered}$$\begin{gathered} t \geqslant 35\;{\rm{s}} \\ 或\left| { {\boldsymbol{P} }_n^{ {\text{UAV} } } - {\boldsymbol{P} }_n^{ {\text{TAG} } } } \right| \leqslant 10 \\ \end{gathered}$
    下载: 导出CSV
  • [1] 代君, 管宇峰, 任淑红. 多旋翼无人机研究现状与发展趋势探讨[J]. 赤峰学院学报(自然科学版), 2016, 32(16): 22-24.

    DAI J, GUAN Y F, REN S H. Multi-rotor UAV discussion on the research status and development trend[J]. Journal of Chifeng University (Natural Science Edition), 2016, 32(16): 22-24(in Chinese).
    [2] 李博, 李小民, 杨森. 美国四旋翼无人机研究现状与关键技术[J]. 飞航导弹, 2018(2): 25-30.

    LI B, LI X M, YANG S. Research status and key technologies of quadrotor UAVs in the U. S. [J]. Aerodynamic Missile Journal, 2018(2): 25-30(in Chinese).
    [3] 黄长强. 未来空战过程智能化关键技术研究[J]. 航空兵器, 2019, 26(1): 11-19.

    HUANG C Q. Research on key technology of future air combat process intelligentization[J]. Aero Weaponry, 2019, 26(1): 11-19 (in Chinese).
    [4] 邓可, 彭宣淇, 周德云. 基于矩阵对策与遗传算法的无人机空战决策[J]. 火力与指挥控制, 2019, 44(12): 61-66.

    DENG K, PENG X Q, ZHOU D Y. UAV air combat decision based on matrix game and genetic algorithm[J]. Fire Control & Command Control, 2019, 44(12): 61-66(in Chinese).
    [5] 邵将, 徐扬, 罗德林. 无人机多机协同对抗决策研究[J]. 信息与控制, 2018, 47(3): 347-354.

    SHAO J, XU Y, LUO D L. Research on UAV multi-aircraft cooperative countermeasure decision[J]. Information and Control , 2018, 47(3): 347-354(in Chinese).
    [6] 陈宇, 张公平, 宋韬, 等. 多任务空地武器多目标协同优化任务规划算法研究[J]. 航空兵器, 2021, 28(2): 62-68.

    CHEN Y, ZHANG G P, SONG T, et al. Research on multi-mission airborne weapon multi-target cooperative optimization planning algorithm[J]. Aero Weaponry, 2021, 28(2): 62-68 (in Chinese).
    [7] 蒲良, 张学军. 基于深度学习的无人机视觉目标检测与跟踪[J]. 北京航空航天大学学报, 2022, 48(5): 872-880. doi: 10.13700/j.bh.1001-5965.2020.0664

    PU L, ZHANG X J. Deep learning based UAV vision object detection and tracking[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(5): 872-880(in Chinese). doi: 10.13700/j.bh.1001-5965.2020.0664
    [8] 王聪. 基于深度学习的无人机单目标识别与跟踪算法研究[D]. 泉州: 华侨大学, 2019.

    WANG C. Research on single target recognition and tracking algorithm for UAV based on deep learning[D]. Quanzhou: Huaqiao University, 2019 (in Chinese).
    [9] 彭闯. 输电线路无人机巡检图像中电力部件识别方法研究[D]. 重庆: 重庆理工大学, 2020.

    PENG C. Research on identification method of power components in UAV inspection images of transmission lines[D]. Chongqing: Chongqing University of Technology, 2020 (in Chinese).
    [10] 马乐乐, 李照洋, 董嘉蓉, 等. 基于计算机视觉及深度学习的无人机手势控制系统[J]. 计算机工程与科学, 2018, 40(5): 872-879. doi: 10.3969/j.issn.1007-130X.2018.05.016

    MA L L, LI Z Y, DONG J R, et al. UAV gesture control system based on computer vision and deep learning[J]. Computer Engineering & Science, 2018, 40(5): 872-879(in Chinese). doi: 10.3969/j.issn.1007-130X.2018.05.016
    [11] KRAFT M, PIECHOCKI M, PTAK B, et al. Autonomous, onboard vision-based trash and litter detection in low altitude aerial images collected by an unmanned aerial vehicle[J]. Remote Sensing, 2021, 13(5): 965. doi: 10.3390/rs13050965
    [12] 秦智慧, 李宁, 刘晓彤, 等. 无模型强化学习研究综述[J]. 计算机科学, 2021, 48(3): 180-187. doi: 10.11896/jsjkx.200700217

    QIN Z H, LI N, LIU X T, et al. Overview of research on model-free reinforcement learning[J]. Computer Science, 2021, 48(3): 180-187(in Chinese). doi: 10.11896/jsjkx.200700217
    [13] 赖俊, 饶瑞. 深度强化学习在室内无人机目标搜索中的应用[J]. 计算机工程与应用, 2020, 56(17): 156-160.

    LAI J, RAO R. Application of deep reinforcement learning in indoor uav target search[J]. Computer Engineering and Applications, 2020, 56(17): 156-160(in Chinese).
    [14] LEE J H, KIM T K, SONG J G, et al. Flight trajectory simulation via reinforcement learning in virtual environment[J]. Journal of the Korea Society for Simulation, 2018, 27(4): 1-8.
    [15] MAGED S A, MIKHAIL B H. Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning[J]. International Journal of Computational Vision and Robotics, 2020, 10(3): 260. doi: 10.1504/IJCVR.2020.107253
    [16] 饶颖露, 邢金昊, 张恒, 等. 基于视觉的无人机板载自主实时精确着陆系统[J]. 计算机工程, 2021, 47(10): 290-297.

    RAO Y L, XING J H, ZHANG H, et al. Vision-based autonomous real-time precise landing system for UAV-borne processors[J]. Computer Engineering, 2021, 47(10): 290-297(in Chinese).
    [17] 何准, 董文瀚, 蔡鸣, 等. 基于DDPG的多旋翼无人机自主引导与跟踪方法[J]. 飞行力学, 2021, 39(2): 63-69.

    HE Z, DONG W H, CAI M, et al. Multi-rotor UAV autonomous guidance and tracking method based on DDPG[J]. Flight Dynamics, 2021, 39(2): 63-69(in Chinese).
    [18] YANG S Y, MENG Z J, CHEN X Z, et al. Real-time obstacle avoidance with deep reinforcement learning three-dimensional autonomous obstacle avoidance for UAV[C]// Proceedings of the 2019 International Conference on Robotics, Intelligent Control and Artificial Intelligence. New York: ACM, 2019: 324-329.
    [19] ZHAO W W, CHU H R, MIAO X K, et al. Research on the multiagent joint proximal policy optimization algorithm controlling cooperative fixed-wing UAV obstacle avoidance[J]. Sensors (Basel, Switzerland), 2020, 20(16): 4546. doi: 10.3390/s20164546
    [20] MAJUMDAR A, GAMEZ N, BENAVIDEZ P, et al. Development of robot operating system (ROS) compatible open source quadcopter flight controller and interface[C]//12th System of Systems Engineering Conference. Piscataway: IEEE Press, 2017: 1-6.
    [21] MEIER L, HONEGGER D, POLLEFEYS M. PX4: A node-based multithreaded open source robotics framework for deeply embedded platforms[C]//IEEE International Conference on Robotics and Automation. Piscataway: IEEE Press, 2015: 6235-6240.
    [22] KOENIG N, HOWARD A. Design and use paradigms for Gazebo, an open-source multi-robot simulator[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE Press, 2004: 2149-2154.
    [23] 汪亮, 王文, 王禹又, 等. 强化学习方法在通信拒止战场仿真环境中多无人机目标搜寻问题上的适用性研究[J]. 中国科学:信息科学, 2020, 50(3): 375-395. doi: 10.1360/SSI-2019-0184

    WANG L, WANG W, WANG Y Y, et al. Feasibility of reinforcement learning for UAV-based target searching in a simulated communication denied environment[J]. Scientia Sinica: Informationis, 2020, 50(3): 375-395(in Chinese). doi: 10.1360/SSI-2019-0184
    [24] 李琛, 黄炎焱, 张永亮, 等. Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用[J]. 系统工程与电子技术, 2021, 43(3): 755-762.

    LI C, HUANG Y Y, ZHANG Y L, et al. Multi-agent decision-making method based on Actor-Critic framework and its application in wargame[J]. Systems Engineering and Electronics, 2021, 43(3): 755-762(in Chinese).
    [25] 杨星鑫, 吕泽均. 基于LSTM的无人机轨迹识别技术研究[J]. 现代计算机, 2020(5): 18-22.

    YANG X X, LV Z J. Research on UAV trajectory recognition based on LSTM[J]. Modern Computer, 2020(5): 18-22 (in Chinese).
    [26] 夏瑜潞. 循环神经网络的发展综述[J]. 电脑知识与技术, 2019, 15(21): 182-184.

    XIA Y L. A review of the development of recurrent neural network[J]. Computer Knowledge and Technology, 2019, 15(21): 182-184(in Chinese).
    [27] 张玉人, 龚志猛. 基于RNN-LSTM的船舶位置预测分析[J]. 计算机与数字工程, 2021, 49(2): 252-258.

    ZHANG Y R, GONG Z M. Ship position prediction analysis based on RNN-LSTM[J]. Computer & Digital Engineering, 2021, 49(2): 252-258 (in Chinese).
    [28] 刘红艳. 基于Attention-LSTM模型的移动目标跟踪技术研究[D]. 北京: 华北电力大学, 2018.

    LIU H Y. Research of moving target tracking technology based on attention-LSTM model[D]. Beijing: North China Electric Power University, 2018 (in Chinese).
  • 加载中
图(9) / 表(1)
计量
  • 文章访问数:  9
  • HTML全文浏览量:  2
  • PDF下载量:  4
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-04-09
  • 录用日期:  2021-06-06
  • 刊出日期:  2021-06-15

目录

    /

    返回文章
    返回
    常见问答