留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于PPO的移动平台自主导航

徐国艳 熊绎维 周彬 陈冠宏

徐国艳, 熊绎维, 周彬, 等 . 基于PPO的移动平台自主导航[J]. 北京航空航天大学学报, 2022, 48(11): 2138-2145. doi: 10.13700/j.bh.1001-5965.2021.0100
引用本文: 徐国艳, 熊绎维, 周彬, 等 . 基于PPO的移动平台自主导航[J]. 北京航空航天大学学报, 2022, 48(11): 2138-2145. doi: 10.13700/j.bh.1001-5965.2021.0100
XU Guoyan, XIONG Yiwei, ZHOU Bin, et al. Autonomous navigation based on PPO for mobile platform[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(11): 2138-2145. doi: 10.13700/j.bh.1001-5965.2021.0100(in Chinese)
Citation: XU Guoyan, XIONG Yiwei, ZHOU Bin, et al. Autonomous navigation based on PPO for mobile platform[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(11): 2138-2145. doi: 10.13700/j.bh.1001-5965.2021.0100(in Chinese)

基于PPO的移动平台自主导航

doi: 10.13700/j.bh.1001-5965.2021.0100
基金项目: 

国家自然科学基金 51775016

详细信息
    通讯作者:

    徐国艳, E-mail: xuguoyan@buaa.edu.cn

  • 中图分类号: TP242.6

Autonomous navigation based on PPO for mobile platform

Funds: 

National Natural Science Foundation of China 51775016

More Information
  • 摘要:

    为解决强化学习算法在自主导航任务中动作输出不连续、训练收敛困难等问题, 提出了一种基于近似策略优化(PPO)算法的移动平台自主导航方法。在PPO算法的基础上设计了基于正态分布的动作策略函数, 解决了移动平台整车线速度和横摆角速度的输出动作连续性问题。设计了一种改进的人工势场算法作为自身位置评价, 有效提高强化学习模型在自主导航场景中的收敛速度。针对导航场景设计了模型的网络框架和奖励函数, 并在Gazebo仿真环境中进行模型训练, 结果表明, 引入自身位置评价的模型收敛速度明显提高。将收敛模型移植入真实环境中, 验证了所提方法的有效性。

     

  • 图 1  状态空间构造示意图

    Figure 1.  Schematic diagram of state space construction

    图 2  模型网络框架示意图

    Figure 2.  Schematic diagram of model network framework

    图 3  Gazebo环境中的障碍物感知效果

    Figure 3.  Obstacle detection in Gazebo environment

    图 4  仿真环境训练场景

    Figure 4.  Training scenario in simulation environment

    图 5  导航模型训练结果

    Figure 5.  Training results of navigation model

    图 6  仿真环境中导航模型所习得路径

    Figure 6.  Learned path of navigation model in simulation environment

    图 7  真实环境下搭建的移动平台

    Figure 7.  Mobile platform built in real environment

    图 8  真实环境下激光雷达点云聚类效果

    Figure 8.  Clustering of LIDAR point cloud in real environment

    图 9  真实环境下自主导航场景

    Figure 9.  Autonomous navigation scene in real environment

    图 10  真实环境下生成的导航轨迹曲线

    Figure 10.  Navigation trajectory curve in real environment

  • [1] 王义林. 地面无人平台自主导航避障系统的研究与实现[D]. 哈尔滨: 哈尔滨工业大学, 2020.

    WANG Y L. Research and implementation of autonomous navigation and obstacle avoidance system for ground unmanned platform[D]. Harbin: Harbin Institute of Technology, 2020(in Chinese).
    [2] 秦圣然. 基于激光传感器的移动机器人导航系统研究[D]. 沈阳: 沈阳工业大学, 2020.

    QIN S R. Research on laser sensor-based navigation system for mobile robots[D]. Shenyang: Shenyang University of Technology, 2020(in Chinese).
    [3] HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths in graphs[J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4(2): 100-107. doi: 10.1109/TSSC.1968.300136
    [4] STENT A. Optimal and efficient path planning for partially-known environments[C]//Proceedings of IEEE International Conference on Robotics and Automation. Piscataway: IEEE Press, 1994, 4: 3310-3317.
    [5] STENT A. The focussed D* algorithm for real-time replanning[C]//Proceedings of the 14th International Joint Conference on Artificial Intelligence. New York: ACM, 1995: 1652-1659.
    [6] LAVALLE S M, KUFFNER J J. Randomized kinodynamic planning[C]//Proceedings of IEEE International Conference on Robotics and Automation. Piscataway: IEEE Press, 1999, 1: 473-479.
    [7] 付雪建. 基于强化学习的移动机器人自主导航研究[D]. 重庆: 重庆大学, 2017.

    FU X J. Research on autonomous navigation of mobile robots based on reinforcement learning[D]. Chongqing: Chongqing University, 2017(in Chinese).
    [8] 杨宁博. 面向环境探测的全向模式移动机器人自主导航研究[D]. 哈尔滨: 哈尔滨工业大学, 2019.

    YANG N B. Research on autonomous navigation of omnidirectional mode mobile robots for environmental detection[D]. Harbin: Harbin Institute of Technology, 2019(in Chinese).
    [9] 陶睿. 基于深度强化学习的移动机器人导航[D]. 济南: 山东大学, 2020.

    TAO R. Deep reinforcement learning-based navigation for mobile robots[D]. Jinan: Shandong University, 2020(in Chinese).
    [10] 何聪. 基于深度强化学习的机器人视觉导航算法[D]. 南京: 东南大学, 2021.

    HE C. Robot visual navigation algorithm based on deep reinforcement learning[D]. Nanjing: Southeast University, 2021(in Chinese).
    [11] TAI L, PAOLO G, LIU M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE Press, 2017: 31-36.
    [12] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017-08-28)[2021-03-01]. https://arxiv.org/abs/1707.06347.
    [13] 刘志平, 余前勇, 査剑锋. 空间直角坐标至两类常用坐标的快速变换[J]. 测绘科学, 2015, 40(3): 8-11. https://www.cnki.com.cn/Article/CJFDTOTAL-CHKD201503002.htm

    LIU Z P, YU Q Y, ZHA J F. Fast coordinate transformations for both XYZ-BLH and XYZ-RhA[J]. Science of Surveying and Mapping, 2015, 40(3): 8-11(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-CHKD201503002.htm
    [14] 刘山洪, 邓彩群. 坐标转换与坐标变换研究[J]. 吉林建筑大学学报, 2016, 33(1): 43-47. https://www.cnki.com.cn/Article/CJFDTOTAL-JLJZ201601011.htm

    LIU S H, DENG C Q. Transformation of coordinate system[J]. Journal of Jilin Jianzhu University, 2016, 33(1): 43-47(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JLJZ201601011.htm
    [15] KHATIB O. Real-time obstacle avoidance system for manipulators and mobile robots[J]. International Journal of Robotics Research, 1986, 5(1): 90-98. doi: 10.1177/027836498600500106
  • 加载中
图(10)
计量
  • 文章访问数:  292
  • HTML全文浏览量:  102
  • PDF下载量:  33
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-03-02
  • 录用日期:  2021-04-11
  • 网络出版日期:  2021-05-07
  • 整期出版日期:  2022-11-20

目录

    /

    返回文章
    返回
    常见问答