无人机自主引导跟踪与避障的近端策略优化

胡多修; 董文瀚; 解武杰

doi:10.13700/j.bh.1001-5965.2021.0182

无人机自主引导跟踪与避障的近端策略优化

doi: 10.13700/j.bh.1001-5965.2021.0182

1.
空军工程大学研究生院，西安 710038
2.
空军工程大学航空工程学院，西安 710038

详细信息

作者简介:
胡多修等：无人机自主引导跟踪与避障的近端策略优化 11

通讯作者:
E-mail：dongwenhan@sina.com

中图分类号: TP181
计量
- 文章访问数: 570
- HTML全文浏览量: 190
- PDF下载量: 78
- 被引次数: 0
出版历程
- 收稿日期: 2021-04-09
- 录用日期: 2021-06-06
- 网络出版日期: 2021-06-15
- 整期出版日期: 2023-01-30

Proximal policy optimization for UAV autonomous guidance, tracking and obstacle avoidance

1.
Graduate School，Air Force Engineering University，Xi’an 710038，China
2.
Aeronautics Engineering College，Air Force Engineering University，Xi’an 710038，China

More Information

Corresponding author: E-mail：dongwenhan@sina.com

摘要

摘要:
针对无人机地面动态目标跟踪问题，建立了远距离自主引导与近距离伴飞避障2个阶段的马尔可夫决策过程模型。在此基础上，提出了一种改进的近端策略优化（PPO）算法。考虑到无人机接收到的数据具有时序性且环境状态存在上下文关联，所提算法采用长短期记忆（LSTM）网络，通过无人机与目标的实时位置关系等状态信息来计算奖励值，更新网络参数，并进行自适应优化迭代。通过基于ROS系统的仿真测试平台进行试验，结果表明：所提算法安全有效地实现了侦察任务全过程的自主机动，与传统的PPO算法相比，LSTM的引入缩短了模型训练时间，跟踪与避障的效率明显提高，进一步加强了算法的鲁棒性、准确性和实时性。
- 多旋翼无人机 /
- 自主引导 /
- 马尔可夫决策过程 /
- 近端策略优化 /
- 长短期记忆
Abstract:
We established a Markov decision process model with two stages of long-distance autonomous guidance and short-distance companion flight avoidance for multi-rotor UAVs to track dynamic ground targets. An improved proximal policy optimization (PPO) algorithm is proposed. Considering that the data received by the UAV are time-sequential and that the environment has contextual relevance, the algorithm uses long short-term memory (LSTM) network to calculate reward values, update network parameters, and perform adaptive optimization iterations through status information such as the real-time position relationship between the UAV and the target. Experiments were conducted with a simulation testing platform based on ROS. Results show that the method proposed safely and effectively realizes autonomous maneuvering during the whole process of the reconnaissance mission. Compared with the traditional PPO algorithm, the algorithm proposed shortens the model training time due to the introduction of LSTM neural network, thus significantly improving the efficiency of obstacle tracking and avoidance. This result further strengthens the robustness, accuracy, and real-time performance of the algorithm.
- multi-rotor UAV /
- autonomous guidance /
- Markov decision process /
- proximal policy optimization /
- long short-term memory

HTML全文

图 1 无人机与目标的空间位置关系

Figure 1. Relative position of UAV and its target

下载: 全尺寸图片幻灯片

图 2 障碍物、目标与无人机的空间位置关系

Figure 2. Relative position of obstacle, target and UAV

下载: 全尺寸图片幻灯片

图 3 马尔可夫决策过程模型的描述

Figure 3. Description of Markov decision process model

下载: 全尺寸图片幻灯片

图 4 自主引导与伴飞避障流程

Figure 4. Flowchart for autonomous guidance and obstacle avoidance of accompanying flight

下载: 全尺寸图片幻灯片

图 5 PPO算法框架

Figure 5. Structure of PPO algorithm

下载: 全尺寸图片幻灯片

图 6 LSTM结构示意图

Figure 6. Schematic of LSTM structure

下载: 全尺寸图片幻灯片

图 7 单步平均奖励变化曲线

Figure 7. Variation curves of single step average reward

下载: 全尺寸图片幻灯片

图 8 不同场景下基于传统PPO算法的运动轨迹

Figure 8. Motion paths in different scenarios based on traditional PPO algorithm

下载: 全尺寸图片幻灯片

图 9 不同场景下基于改进PPO算法的运动轨迹

Figure 9. Motion paths in different scenarios based on improved PPO algorithm

下载: 全尺寸图片幻灯片

表 1 仿真参数设置

Table 1. Simulation parameter setting

参数	场景1	场景2	场景3
任务阶段	自主引导过程	伴飞避障过程	全过程
采用的模型	自主引导模型	伴飞避障模型	自主引导与伴飞避障模型
时间周期N/s	${\text{0} }{\text{.033\;3} }$	${\text{0} }{\text{.033\;3} }$	${\text{0} }{\text{.033\;3} }$
$\gamma $	${\text{0}}{\text{.99}}$	${\text{0}}{\text{.99}}$	${\text{0}}{\text{.99}}$
$\varepsilon $	${\text{0}}{\text{.2}}$	${\text{0}}{\text{.2}}$	${\text{0}}{\text{.2}}$
初始条件/m	$\begin{gathered} {H}_{\text{0} }^{ {\text{UAV} } }{\text{ = 30} } \\ 70 \leqslant \left\| { {\boldsymbol{P} }_0^{ {\text{UAV} } } - {\boldsymbol{P} }_0^{ {\text{TAG} } } } \right\| \leqslant 100 \\ \end{gathered}$	$\begin{gathered} { H}_{\text{0} }^{ {\text{UAV} } }{\text{ = 10} } \\ 50 \leqslant \left\| { {\boldsymbol{P} }_0^{ {\text{UAV} } } - {\boldsymbol{P} }_0^{ {\text{TAG} } } } \right\| \leqslant 80 \\ \end{gathered}$	$\begin{gathered} { H}_{\text{0} }^{ {\text{UAV} } }{\text{ = 30} } \\ 70 \leqslant \left\| { {\boldsymbol{P} }_0^{ {\text{UAV} } } - {\boldsymbol{P} }_0^{ {\text{TAG} } } } \right\| \leqslant 100 \\ \end{gathered}$
终止条件/m	$\begin{gathered} t \geqslant 35\;{\rm{s} } \\ 或\left\| { {\boldsymbol{P} }_n^{ {\text{UAV} } } - {\boldsymbol{P} }_n^{ {\text{TAG} } } } \right\| \leqslant 2 \\ \end{gathered}$	$\begin{gathered} t \geqslant 35\;{\rm{s}} \\ 或\left\| { {\boldsymbol{P} }_n^{ {\text{UAV} } } - {\boldsymbol{P} }_n^{ {\text{TAG} } } } \right\| \leqslant 10 \\ \end{gathered}$	$\begin{gathered} t \geqslant 35\;{\rm{s}} \\ 或\left\| { {\boldsymbol{P} }_n^{ {\text{UAV} } } - {\boldsymbol{P} }_n^{ {\text{TAG} } } } \right\| \leqslant 10 \\ \end{gathered}$

下载: 导出CSV

参考文献(28)

[1]	代君, 管宇峰, 任淑红. 多旋翼无人机研究现状与发展趋势探讨[J]. 赤峰学院学报(自然科学版), 2016, 32(16): 22-24. DAI J, GUAN Y F, REN S H. Multi-rotor UAV discussion on the research status and development trend[J]. Journal of Chifeng University (Natural Science Edition), 2016, 32(16): 22-24(in Chinese).
[2]	李博, 李小民, 杨森. 美国四旋翼无人机研究现状与关键技术[J]. 飞航导弹, 2018(2): 25-30. LI B, LI X M, YANG S. Research status and key technologies of quadrotor UAVs in the U. S. [J]. Aerodynamic Missile Journal, 2018(2): 25-30(in Chinese).
[3]	黄长强. 未来空战过程智能化关键技术研究[J]. 航空兵器, 2019, 26(1): 11-19. HUANG C Q. Research on key technology of future air combat process intelligentization[J]. Aero Weaponry, 2019, 26(1): 11-19 (in Chinese).
[4]	邓可, 彭宣淇, 周德云. 基于矩阵对策与遗传算法的无人机空战决策[J]. 火力与指挥控制, 2019, 44(12): 61-66. DENG K, PENG X Q, ZHOU D Y. UAV air combat decision based on matrix game and genetic algorithm[J]. Fire Control & Command Control, 2019, 44(12): 61-66(in Chinese).
[5]	邵将, 徐扬, 罗德林. 无人机多机协同对抗决策研究[J]. 信息与控制, 2018, 47(3): 347-354. SHAO J, XU Y, LUO D L. Research on UAV multi-aircraft cooperative countermeasure decision[J]. Information and Control , 2018, 47(3): 347-354(in Chinese).
[6]	陈宇, 张公平, 宋韬, 等. 多任务空地武器多目标协同优化任务规划算法研究[J]. 航空兵器, 2021, 28(2): 62-68. CHEN Y, ZHANG G P, SONG T, et al. Research on multi-mission airborne weapon multi-target cooperative optimization planning algorithm[J]. Aero Weaponry, 2021, 28(2): 62-68 (in Chinese).
[7]	蒲良, 张学军. 基于深度学习的无人机视觉目标检测与跟踪[J]. 北京航空航天大学学报, 2022, 48(5): 872-880. doi: 10.13700/j.bh.1001-5965.2020.0664 PU L, ZHANG X J. Deep learning based UAV vision object detection and tracking[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(5): 872-880(in Chinese). doi: 10.13700/j.bh.1001-5965.2020.0664
[8]	王聪. 基于深度学习的无人机单目标识别与跟踪算法研究[D]. 泉州: 华侨大学, 2019. WANG C. Research on single target recognition and tracking algorithm for UAV based on deep learning[D]. Quanzhou: Huaqiao University, 2019 (in Chinese).
[9]	彭闯. 输电线路无人机巡检图像中电力部件识别方法研究[D]. 重庆: 重庆理工大学, 2020. PENG C. Research on identification method of power components in UAV inspection images of transmission lines[D]. Chongqing: Chongqing University of Technology, 2020 (in Chinese).
[10]	马乐乐, 李照洋, 董嘉蓉, 等. 基于计算机视觉及深度学习的无人机手势控制系统[J]. 计算机工程与科学, 2018, 40(5): 872-879. doi: 10.3969/j.issn.1007-130X.2018.05.016 MA L L, LI Z Y, DONG J R, et al. UAV gesture control system based on computer vision and deep learning[J]. Computer Engineering & Science, 2018, 40(5): 872-879(in Chinese). doi: 10.3969/j.issn.1007-130X.2018.05.016
[11]	KRAFT M, PIECHOCKI M, PTAK B, et al. Autonomous, onboard vision-based trash and litter detection in low altitude aerial images collected by an unmanned aerial vehicle[J]. Remote Sensing, 2021, 13(5): 965. doi: 10.3390/rs13050965
[12]	秦智慧, 李宁, 刘晓彤, 等. 无模型强化学习研究综述[J]. 计算机科学, 2021, 48(3): 180-187. doi: 10.11896/jsjkx.200700217 QIN Z H, LI N, LIU X T, et al. Overview of research on model-free reinforcement learning[J]. Computer Science, 2021, 48(3): 180-187(in Chinese). doi: 10.11896/jsjkx.200700217
[13]	赖俊, 饶瑞. 深度强化学习在室内无人机目标搜索中的应用[J]. 计算机工程与应用, 2020, 56(17): 156-160. LAI J, RAO R. Application of deep reinforcement learning in indoor uav target search[J]. Computer Engineering and Applications, 2020, 56(17): 156-160(in Chinese).
[14]	LEE J H, KIM T K, SONG J G, et al. Flight trajectory simulation via reinforcement learning in virtual environment[J]. Journal of the Korea Society for Simulation, 2018, 27(4): 1-8.
[15]	MAGED S A, MIKHAIL B H. Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning[J]. International Journal of Computational Vision and Robotics, 2020, 10(3): 260. doi: 10.1504/IJCVR.2020.107253
[16]	饶颖露, 邢金昊, 张恒, 等. 基于视觉的无人机板载自主实时精确着陆系统[J]. 计算机工程, 2021, 47(10): 290-297. RAO Y L, XING J H, ZHANG H, et al. Vision-based autonomous real-time precise landing system for UAV-borne processors[J]. Computer Engineering, 2021, 47(10): 290-297(in Chinese).
[17]	何准, 董文瀚, 蔡鸣, 等. 基于DDPG的多旋翼无人机自主引导与跟踪方法[J]. 飞行力学, 2021, 39(2): 63-69. HE Z, DONG W H, CAI M, et al. Multi-rotor UAV autonomous guidance and tracking method based on DDPG[J]. Flight Dynamics, 2021, 39(2): 63-69(in Chinese).
[18]	YANG S Y, MENG Z J, CHEN X Z, et al. Real-time obstacle avoidance with deep reinforcement learning three-dimensional autonomous obstacle avoidance for UAV[C]// Proceedings of the 2019 International Conference on Robotics, Intelligent Control and Artificial Intelligence. New York: ACM, 2019: 324-329.
[19]	ZHAO W W, CHU H R, MIAO X K, et al. Research on the multiagent joint proximal policy optimization algorithm controlling cooperative fixed-wing UAV obstacle avoidance[J]. Sensors (Basel, Switzerland), 2020, 20(16): 4546. doi: 10.3390/s20164546
[20]	MAJUMDAR A, GAMEZ N, BENAVIDEZ P, et al. Development of robot operating system (ROS) compatible open source quadcopter flight controller and interface[C]//12th System of Systems Engineering Conference. Piscataway: IEEE Press, 2017: 1-6.
[21]	MEIER L, HONEGGER D, POLLEFEYS M. PX4: A node-based multithreaded open source robotics framework for deeply embedded platforms[C]//IEEE International Conference on Robotics and Automation. Piscataway: IEEE Press, 2015: 6235-6240.
[22]	KOENIG N, HOWARD A. Design and use paradigms for Gazebo, an open-source multi-robot simulator[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE Press, 2004: 2149-2154.
[23]	汪亮, 王文, 王禹又, 等. 强化学习方法在通信拒止战场仿真环境中多无人机目标搜寻问题上的适用性研究[J]. 中国科学:信息科学, 2020, 50(3): 375-395. doi: 10.1360/SSI-2019-0184 WANG L, WANG W, WANG Y Y, et al. Feasibility of reinforcement learning for UAV-based target searching in a simulated communication denied environment[J]. Scientia Sinica: Informationis, 2020, 50(3): 375-395(in Chinese). doi: 10.1360/SSI-2019-0184
[24]	李琛, 黄炎焱, 张永亮, 等. Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用[J]. 系统工程与电子技术, 2021, 43(3): 755-762. LI C, HUANG Y Y, ZHANG Y L, et al. Multi-agent decision-making method based on Actor-Critic framework and its application in wargame[J]. Systems Engineering and Electronics, 2021, 43(3): 755-762(in Chinese).
[25]	杨星鑫, 吕泽均. 基于LSTM的无人机轨迹识别技术研究[J]. 现代计算机, 2020(5): 18-22. YANG X X, LV Z J. Research on UAV trajectory recognition based on LSTM[J]. Modern Computer, 2020(5): 18-22 (in Chinese).
[26]	夏瑜潞. 循环神经网络的发展综述[J]. 电脑知识与技术, 2019, 15(21): 182-184. XIA Y L. A review of the development of recurrent neural network[J]. Computer Knowledge and Technology, 2019, 15(21): 182-184(in Chinese).
[27]	张玉人, 龚志猛. 基于RNN-LSTM的船舶位置预测分析[J]. 计算机与数字工程, 2021, 49(2): 252-258. ZHANG Y R, GONG Z M. Ship position prediction analysis based on RNN-LSTM[J]. Computer & Digital Engineering, 2021, 49(2): 252-258 (in Chinese).
[28]	刘红艳. 基于Attention-LSTM模型的移动目标跟踪技术研究[D]. 北京: 华北电力大学, 2018. LIU H Y. Research of moving target tracking technology based on attention-LSTM model[D]. Beijing: North China Electric Power University, 2018 (in Chinese).