基于PPO的移动平台自主导航

徐国艳; 熊绎维; 周彬; 陈冠宏

doi:10.13700/j.bh.1001-5965.2021.0100

基于PPO的移动平台自主导航

doi: 10.13700/j.bh.1001-5965.2021.0100

北京航空航天大学交通科学与工程学院特种车辆无人运输技术工业和信息化部重点实验室, 北京 100191

基金项目:

国家自然科学基金 51775016

详细信息

通讯作者:
徐国艳, E-mail: xuguoyan@buaa.edu.cn

中图分类号: TP242.6
计量
- 文章访问数: 343
- HTML全文浏览量: 132
- PDF下载量: 38
- 被引次数: 0
出版历程
- 收稿日期: 2021-03-02
- 录用日期: 2021-04-11
- 网络出版日期: 2021-05-07
- 整期出版日期: 2022-11-20

Autonomous navigation based on PPO for mobile platform

Key Laboratory of Autonomous Transportation Technology for Special Vehicles, Ministry of Industry and Information Technology, School of Transportation Science and Engineering, Beihang University, Beijing 100191, China

Funds:

National Natural Science Foundation of China 51775016

More Information

Corresponding author: XU Guoyan, E-mail: xuguoyan@buaa.edu.cn

摘要

摘要:
为解决强化学习算法在自主导航任务中动作输出不连续、训练收敛困难等问题, 提出了一种基于近似策略优化(PPO)算法的移动平台自主导航方法。在PPO算法的基础上设计了基于正态分布的动作策略函数, 解决了移动平台整车线速度和横摆角速度的输出动作连续性问题。设计了一种改进的人工势场算法作为自身位置评价, 有效提高强化学习模型在自主导航场景中的收敛速度。针对导航场景设计了模型的网络框架和奖励函数, 并在Gazebo仿真环境中进行模型训练, 结果表明, 引入自身位置评价的模型收敛速度明显提高。将收敛模型移植入真实环境中, 验证了所提方法的有效性。
- 近似策略优化算法 /
- 移动平台 /
- 自主导航 /
- 强化学习 /
- 人工势场
Abstract:
This paper presents an autonomous navigation method based on proximal policy optimization (PPO) algorithm for mobile platform. In this method, GNSS and LADAR are used for sensing environment information. To define the state of reinforcement learning model, an ego position evaluation method is introduced based on improved artificial potential field algorithm. After that, on the basis of PPO algorithm, a kind of action policy function is designed based on Gaussian distribution, which solves the continuity problem of the vehicle linear velocity and yaw velocity. Furthermore, the network framework and reward function of the model are also designed for navigation scenarios. In order to train the navigation model, a virtual environment based on Gazebo is built. The training results show that the ego position evaluation method obviously helps to improve the speed of model convergence. Finally, the navigation model is transplanted to a real environment, which verifies the effectiveness of the proposed method.
- proximal policy optimization algorithm /
- mobile platform /
- autonomous navigation /
- reinforcement learning /
- artificial potential field

HTML全文

图 1 状态空间构造示意图

Figure 1. Schematic diagram of state space construction

下载: 全尺寸图片幻灯片

图 2 模型网络框架示意图

Figure 2. Schematic diagram of model network framework

下载: 全尺寸图片幻灯片

图 3 Gazebo环境中的障碍物感知效果

Figure 3. Obstacle detection in Gazebo environment

下载: 全尺寸图片幻灯片

图 4 仿真环境训练场景

Figure 4. Training scenario in simulation environment

下载: 全尺寸图片幻灯片

图 5 导航模型训练结果

Figure 5. Training results of navigation model

下载: 全尺寸图片幻灯片

图 6 仿真环境中导航模型所习得路径

Figure 6. Learned path of navigation model in simulation environment

下载: 全尺寸图片幻灯片

图 7 真实环境下搭建的移动平台

Figure 7. Mobile platform built in real environment

下载: 全尺寸图片幻灯片

图 8 真实环境下激光雷达点云聚类效果

Figure 8. Clustering of LIDAR point cloud in real environment

下载: 全尺寸图片幻灯片

图 9 真实环境下自主导航场景

Figure 9. Autonomous navigation scene in real environment

下载: 全尺寸图片幻灯片

图 10 真实环境下生成的导航轨迹曲线

Figure 10. Navigation trajectory curve in real environment

下载: 全尺寸图片幻灯片

参考文献(15)

[1]	王义林. 地面无人平台自主导航避障系统的研究与实现[D]. 哈尔滨: 哈尔滨工业大学, 2020. WANG Y L. Research and implementation of autonomous navigation and obstacle avoidance system for ground unmanned platform[D]. Harbin: Harbin Institute of Technology, 2020(in Chinese).
[2]	秦圣然. 基于激光传感器的移动机器人导航系统研究[D]. 沈阳: 沈阳工业大学, 2020. QIN S R. Research on laser sensor-based navigation system for mobile robots[D]. Shenyang: Shenyang University of Technology, 2020(in Chinese).
[3]	HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths in graphs[J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4(2): 100-107. doi: 10.1109/TSSC.1968.300136
[4]	STENT A. Optimal and efficient path planning for partially-known environments[C]//Proceedings of IEEE International Conference on Robotics and Automation. Piscataway: IEEE Press, 1994, 4: 3310-3317.
[5]	STENT A. The focussed D^* algorithm for real-time replanning[C]//Proceedings of the 14th International Joint Conference on Artificial Intelligence. New York: ACM, 1995: 1652-1659.
[6]	LAVALLE S M, KUFFNER J J. Randomized kinodynamic planning[C]//Proceedings of IEEE International Conference on Robotics and Automation. Piscataway: IEEE Press, 1999, 1: 473-479.
[7]	付雪建. 基于强化学习的移动机器人自主导航研究[D]. 重庆: 重庆大学, 2017. FU X J. Research on autonomous navigation of mobile robots based on reinforcement learning[D]. Chongqing: Chongqing University, 2017(in Chinese).
[8]	杨宁博. 面向环境探测的全向模式移动机器人自主导航研究[D]. 哈尔滨: 哈尔滨工业大学, 2019. YANG N B. Research on autonomous navigation of omnidirectional mode mobile robots for environmental detection[D]. Harbin: Harbin Institute of Technology, 2019(in Chinese).
[9]	陶睿. 基于深度强化学习的移动机器人导航[D]. 济南: 山东大学, 2020. TAO R. Deep reinforcement learning-based navigation for mobile robots[D]. Jinan: Shandong University, 2020(in Chinese).
[10]	何聪. 基于深度强化学习的机器人视觉导航算法[D]. 南京: 东南大学, 2021. HE C. Robot visual navigation algorithm based on deep reinforcement learning[D]. Nanjing: Southeast University, 2021(in Chinese).
[11]	TAI L, PAOLO G, LIU M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE Press, 2017: 31-36.
[12]	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017-08-28)[2021-03-01]. https://arxiv.org/abs/1707.06347.
[13]	刘志平, 余前勇, 査剑锋. 空间直角坐标至两类常用坐标的快速变换[J]. 测绘科学, 2015, 40(3): 8-11. https://www.cnki.com.cn/Article/CJFDTOTAL-CHKD201503002.htm LIU Z P, YU Q Y, ZHA J F. Fast coordinate transformations for both XYZ-BLH and XYZ-RhA[J]. Science of Surveying and Mapping, 2015, 40(3): 8-11(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-CHKD201503002.htm
[14]	刘山洪, 邓彩群. 坐标转换与坐标变换研究[J]. 吉林建筑大学学报, 2016, 33(1): 43-47. https://www.cnki.com.cn/Article/CJFDTOTAL-JLJZ201601011.htm LIU S H, DENG C Q. Transformation of coordinate system[J]. Journal of Jilin Jianzhu University, 2016, 33(1): 43-47(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JLJZ201601011.htm
[15]	KHATIB O. Real-time obstacle avoidance system for manipulators and mobile robots[J]. International Journal of Robotics Research, 1986, 5(1): 90-98. doi: 10.1177/027836498600500106