Proximal policy optimization for UAV autonomous guidance, tracking and obstacle avoidance

HU Duoxiu; DONG Wenhan; XIE Wujie

doi:10.13700/j.bh.1001-5965.2021.0182

Volume 49 Issue 1

Jan. 2023

Turn off MathJax

Article Contents

Journal of Beijing University of Aeronautics and Astronautics > 2023 > 49(1): 195-205.

HU D X，DONG W H，XIE W J. Proximal policy optimization for UAV autonomous guidance, tracking and obstacle avoidance[J]. Journal of Beijing University of Aeronautics and Astronautics，2023，49（1）：195-205 （in Chinese） doi: 10.13700/j.bh.1001-5965.2021.0182

Citation:

PDF( 1655 KB)

Proximal policy optimization for UAV autonomous guidance, tracking and obstacle avoidance

doi: 10.13700/j.bh.1001-5965.2021.0182

1.
Graduate School，Air Force Engineering University，Xi’an 710038，China
2.
Aeronautics Engineering College，Air Force Engineering University，Xi’an 710038，China

More Information

Corresponding author: E-mail：dongwenhan@sina.com
Received Date: 09 Apr 2021
Accepted Date: 06 Jun 2021

Available Online: 02 Jun 2023

Publish Date: 15 Jun 2021

Abstract

Abstract

We established a Markov decision process model with two stages of long-distance autonomous guidance and short-distance companion flight avoidance for multi-rotor UAVs to track dynamic ground targets. An improved proximal policy optimization (PPO) algorithm is proposed. Considering that the data received by the UAV are time-sequential and that the environment has contextual relevance, the algorithm uses long short-term memory (LSTM) network to calculate reward values, update network parameters, and perform adaptive optimization iterations through status information such as the real-time position relationship between the UAV and the target. Experiments were conducted with a simulation testing platform based on ROS. Results show that the method proposed safely and effectively realizes autonomous maneuvering during the whole process of the reconnaissance mission. Compared with the traditional PPO algorithm, the algorithm proposed shortens the model training time due to the introduction of LSTM neural network, thus significantly improving the efficiency of obstacle tracking and avoidance. This result further strengthens the robustness, accuracy, and real-time performance of the algorithm.
- multi-rotor UAV,
- autonomous guidance,
- Markov decision process,
- proximal policy optimization,
- long short-term memory

FullText(HTML)

References(28)

References

[1]	代君, 管宇峰, 任淑红. 多旋翼无人机研究现状与发展趋势探讨[J]. 赤峰学院学报(自然科学版), 2016, 32(16): 22-24. DAI J, GUAN Y F, REN S H. Multi-rotor UAV discussion on the research status and development trend[J]. Journal of Chifeng University (Natural Science Edition), 2016, 32(16): 22-24(in Chinese).
[2]	李博, 李小民, 杨森. 美国四旋翼无人机研究现状与关键技术[J]. 飞航导弹, 2018(2): 25-30. LI B, LI X M, YANG S. Research status and key technologies of quadrotor UAVs in the U. S. [J]. Aerodynamic Missile Journal, 2018(2): 25-30(in Chinese).
[3]	黄长强. 未来空战过程智能化关键技术研究[J]. 航空兵器, 2019, 26(1): 11-19. HUANG C Q. Research on key technology of future air combat process intelligentization[J]. Aero Weaponry, 2019, 26(1): 11-19 (in Chinese).
[4]	邓可, 彭宣淇, 周德云. 基于矩阵对策与遗传算法的无人机空战决策[J]. 火力与指挥控制, 2019, 44(12): 61-66. DENG K, PENG X Q, ZHOU D Y. UAV air combat decision based on matrix game and genetic algorithm[J]. Fire Control & Command Control, 2019, 44(12): 61-66(in Chinese).
[5]	邵将, 徐扬, 罗德林. 无人机多机协同对抗决策研究[J]. 信息与控制, 2018, 47(3): 347-354. SHAO J, XU Y, LUO D L. Research on UAV multi-aircraft cooperative countermeasure decision[J]. Information and Control , 2018, 47(3): 347-354(in Chinese).
[6]	陈宇, 张公平, 宋韬, 等. 多任务空地武器多目标协同优化任务规划算法研究[J]. 航空兵器, 2021, 28(2): 62-68. CHEN Y, ZHANG G P, SONG T, et al. Research on multi-mission airborne weapon multi-target cooperative optimization planning algorithm[J]. Aero Weaponry, 2021, 28(2): 62-68 (in Chinese).
[7]	蒲良, 张学军. 基于深度学习的无人机视觉目标检测与跟踪[J]. 北京航空航天大学学报, 2022, 48(5): 872-880. doi: 10.13700/j.bh.1001-5965.2020.0664 PU L, ZHANG X J. Deep learning based UAV vision object detection and tracking[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(5): 872-880(in Chinese). doi: 10.13700/j.bh.1001-5965.2020.0664
[8]	王聪. 基于深度学习的无人机单目标识别与跟踪算法研究[D]. 泉州: 华侨大学, 2019. WANG C. Research on single target recognition and tracking algorithm for UAV based on deep learning[D]. Quanzhou: Huaqiao University, 2019 (in Chinese).
[9]	彭闯. 输电线路无人机巡检图像中电力部件识别方法研究[D]. 重庆: 重庆理工大学, 2020. PENG C. Research on identification method of power components in UAV inspection images of transmission lines[D]. Chongqing: Chongqing University of Technology, 2020 (in Chinese).
[10]	马乐乐, 李照洋, 董嘉蓉, 等. 基于计算机视觉及深度学习的无人机手势控制系统[J]. 计算机工程与科学, 2018, 40(5): 872-879. doi: 10.3969/j.issn.1007-130X.2018.05.016 MA L L, LI Z Y, DONG J R, et al. UAV gesture control system based on computer vision and deep learning[J]. Computer Engineering & Science, 2018, 40(5): 872-879(in Chinese). doi: 10.3969/j.issn.1007-130X.2018.05.016
[11]	KRAFT M, PIECHOCKI M, PTAK B, et al. Autonomous, onboard vision-based trash and litter detection in low altitude aerial images collected by an unmanned aerial vehicle[J]. Remote Sensing, 2021, 13(5): 965. doi: 10.3390/rs13050965
[12]	秦智慧, 李宁, 刘晓彤, 等. 无模型强化学习研究综述[J]. 计算机科学, 2021, 48(3): 180-187. doi: 10.11896/jsjkx.200700217 QIN Z H, LI N, LIU X T, et al. Overview of research on model-free reinforcement learning[J]. Computer Science, 2021, 48(3): 180-187(in Chinese). doi: 10.11896/jsjkx.200700217
[13]	赖俊, 饶瑞. 深度强化学习在室内无人机目标搜索中的应用[J]. 计算机工程与应用, 2020, 56(17): 156-160. LAI J, RAO R. Application of deep reinforcement learning in indoor uav target search[J]. Computer Engineering and Applications, 2020, 56(17): 156-160(in Chinese).
[14]	LEE J H, KIM T K, SONG J G, et al. Flight trajectory simulation via reinforcement learning in virtual environment[J]. Journal of the Korea Society for Simulation, 2018, 27(4): 1-8.
[15]	MAGED S A, MIKHAIL B H. Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning[J]. International Journal of Computational Vision and Robotics, 2020, 10(3): 260. doi: 10.1504/IJCVR.2020.107253
[16]	饶颖露, 邢金昊, 张恒, 等. 基于视觉的无人机板载自主实时精确着陆系统[J]. 计算机工程, 2021, 47(10): 290-297. RAO Y L, XING J H, ZHANG H, et al. Vision-based autonomous real-time precise landing system for UAV-borne processors[J]. Computer Engineering, 2021, 47(10): 290-297(in Chinese).
[17]	何准, 董文瀚, 蔡鸣, 等. 基于DDPG的多旋翼无人机自主引导与跟踪方法[J]. 飞行力学, 2021, 39(2): 63-69. HE Z, DONG W H, CAI M, et al. Multi-rotor UAV autonomous guidance and tracking method based on DDPG[J]. Flight Dynamics, 2021, 39(2): 63-69(in Chinese).
[18]	YANG S Y, MENG Z J, CHEN X Z, et al. Real-time obstacle avoidance with deep reinforcement learning three-dimensional autonomous obstacle avoidance for UAV[C]// Proceedings of the 2019 International Conference on Robotics, Intelligent Control and Artificial Intelligence. New York: ACM, 2019: 324-329.
[19]	ZHAO W W, CHU H R, MIAO X K, et al. Research on the multiagent joint proximal policy optimization algorithm controlling cooperative fixed-wing UAV obstacle avoidance[J]. Sensors (Basel, Switzerland), 2020, 20(16): 4546. doi: 10.3390/s20164546
[20]	MAJUMDAR A, GAMEZ N, BENAVIDEZ P, et al. Development of robot operating system (ROS) compatible open source quadcopter flight controller and interface[C]//12th System of Systems Engineering Conference. Piscataway: IEEE Press, 2017: 1-6.
[21]	MEIER L, HONEGGER D, POLLEFEYS M. PX4: A node-based multithreaded open source robotics framework for deeply embedded platforms[C]//IEEE International Conference on Robotics and Automation. Piscataway: IEEE Press, 2015: 6235-6240.
[22]	KOENIG N, HOWARD A. Design and use paradigms for Gazebo, an open-source multi-robot simulator[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE Press, 2004: 2149-2154.
[23]	汪亮, 王文, 王禹又, 等. 强化学习方法在通信拒止战场仿真环境中多无人机目标搜寻问题上的适用性研究[J]. 中国科学:信息科学, 2020, 50(3): 375-395. doi: 10.1360/SSI-2019-0184 WANG L, WANG W, WANG Y Y, et al. Feasibility of reinforcement learning for UAV-based target searching in a simulated communication denied environment[J]. Scientia Sinica: Informationis, 2020, 50(3): 375-395(in Chinese). doi: 10.1360/SSI-2019-0184
[24]	李琛, 黄炎焱, 张永亮, 等. Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用[J]. 系统工程与电子技术, 2021, 43(3): 755-762. LI C, HUANG Y Y, ZHANG Y L, et al. Multi-agent decision-making method based on Actor-Critic framework and its application in wargame[J]. Systems Engineering and Electronics, 2021, 43(3): 755-762(in Chinese).
[25]	杨星鑫, 吕泽均. 基于LSTM的无人机轨迹识别技术研究[J]. 现代计算机, 2020(5): 18-22. YANG X X, LV Z J. Research on UAV trajectory recognition based on LSTM[J]. Modern Computer, 2020(5): 18-22 (in Chinese).
[26]	夏瑜潞. 循环神经网络的发展综述[J]. 电脑知识与技术, 2019, 15(21): 182-184. XIA Y L. A review of the development of recurrent neural network[J]. Computer Knowledge and Technology, 2019, 15(21): 182-184(in Chinese).
[27]	张玉人, 龚志猛. 基于RNN-LSTM的船舶位置预测分析[J]. 计算机与数字工程, 2021, 49(2): 252-258. ZHANG Y R, GONG Z M. Ship position prediction analysis based on RNN-LSTM[J]. Computer & Digital Engineering, 2021, 49(2): 252-258 (in Chinese).
[28]	刘红艳. 基于Attention-LSTM模型的移动目标跟踪技术研究[D]. 北京: 华北电力大学, 2018. LIU H Y. Research of moving target tracking technology based on attention-LSTM model[D]. Beijing: North China Electric Power University, 2018 (in Chinese).

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(9) / Tables(1)

Get Citation

PDF

XML

Article Metrics

Article views(587) PDF downloads(81)

Proximal policy optimization for UAV autonomous guidance, tracking and obstacle avoidance

doi: 10.13700/j.bh.1001-5965.2021.0182

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Proximal policy optimization for UAV autonomous guidance, tracking and obstacle avoidance

doi: 10.13700/j.bh.1001-5965.2021.0182

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content