基于改进YOLOv5s的动态视觉SLAM算法

蒋畅江; 刘朋; 舒鹏

doi:10.13700/j.bh.1001-5965.2023.0154

基于改进YOLOv5s的动态视觉SLAM算法

doi: 10.13700/j.bh.1001-5965.2023.0154

重庆邮电大学自动化学院，重庆 400065

基金项目: 国家自然科学基金(62277008)

详细信息

通讯作者:
E-mail：jiangcj@cqupt.edu.cn

中图分类号: V221⁺.3；TB553
计量
- 文章访问数: 538
- HTML全文浏览量: 118
- PDF下载量: 15
- 被引次数: 0
出版历程
- 收稿日期: 2023-03-31
- 录用日期: 2023-06-09
- 网络出版日期: 2023-07-26
- 整期出版日期: 2025-03-27

Dynamic visual SLAM algorithm based on improved YOLOv5s

School of Automation，Chongqing University of Posts and Telecommunications，Chongqing 400065，China

Funds: National Natural Science Foundation of China (62277008)

More Information

Corresponding author: E-mail：jiangcj@cqupt.edu.cn

摘要

摘要:
针对室内动态场景中存在的动态目标会降低同步定位与地图构建(SLAM)系统的鲁棒性和相机定位精度问题，提出了一种基于目标检测网络的动态视觉SLAM算法。选择YOLOv5系列中深度和特征图宽度最小的YOLOv5s作为目标检测网络，并将其主干网络替换为PP-LCNet轻量级网络，在VOC2007+VOC2012数据集训练后，由实验结果可知，PP-LCNet-YOLOv5s模型较YOLOv5s模型网络参数量减少了41.89%，运行速度加快了39.13%。在视觉SLAM系统的跟踪线程中引入由改进的目标检测网络和稀疏光流法结合的并行线程，用于剔除动态特征点，仅利用静态特征点进行特征匹配和相机位姿估计。实验结果表明，所提算法在动态场景下的相机定位精度较ORB-SLAM3提升了92.38%。
- 同步定位与地图构建 /
- 目标检测 /
- 动态特征点剔除 /
- 定位精度 /
- 光流法
Abstract:
A dynamic visual simultaneous localization and mapping (SLAM) algorithm based on an object detection network is proposed to address the robustness and camera localization accuracy issues caused by dynamic targets in indoor dynamic scenes. The lightweight network PP-LCNet replaces the YOLOv5 backbone network, and the YOLOv5s with the shortest depth and feature map width are chosen as the object detection network. After training on the VOC2007+VOC2012 dataset, experimental results show that the PP-LCNet-YOLOv5s model reduces the network parameters by 41.89% and improves the running speed by 39.13% compared to the YOLOv5s model. In order to eliminate dynamic feature points from the tracking thread of the visual SLAM system, a parallel thread that combines the enhanced object recognition network and sparse optical flow approach is implemented. Only static feature points are used for feature matching and camera position estimation. Experimental results show that the proposed algorithm improves the camera localization accuracy in dynamic scenes by 92.38% compared to ORB-SLAM3.
- simultaneous localization and mapping (SLAM) /
- target detection /
- dynamic feature point elimination /
- positioning accuracy /
- optical flow approach

HTML全文

图 1 PP-LCNet-YOLOv5s结构示意图

Figure 1. Diagram of PP-LCNet-YOLOv5s structure

下载: 全尺寸图片幻灯片

图 2 改进的系统流程及框架

Figure 2. Improved system process and framework

下载: 全尺寸图片幻灯片

图 3 动态特征点剔除流程

Figure 3. Flowchart for dynamic feature point elimination

下载: 全尺寸图片幻灯片

图 4 估计轨迹与真实轨迹之间的差距

Figure 4. Difference between estimated trajectory and real trajectory

下载: 全尺寸图片幻灯片

图 5 walking_xyz和sitting_xyz序列的误差分布

Figure 5. Error distribution of walking_xyz and sitting_xyz sequences

下载: 全尺寸图片幻灯片

表 1 室内目标的动态属性设定

Table 1. Dynamic property setting for indoor targets

目标	类别	目标	类别
人	动态目标	椅子	潜在动态目标
鸟	动态目标	tv	潜在动态目标
猫	动态目标	水杯	潜在动态目标
狗	动态目标	书	潜在动态目标
扫地机器人	动态目标	桌子	潜在动态目标

下载: 导出CSV

表 2 目标检测网络在CPU上的性能测试

Table 2. Performance test of object detection network on CPU

模型	算力要求	参数量	帧速率/（帧·ms⁻¹）	精确率/%	召回率/%	mAP@0.5/%
PP-LCNet-YOLOv5s	8.7 GFLOPs	4.3×10⁶	11.2	76.1	84.3	73.0
YOLOv5s	17.4 GFLOPs	7.4×10⁶	18.4	78.9	87.5	75.6
YOLOv5m	37.4 GFLOPs	21.4×10⁶	26.8	81.2	90.5	78.8
YOLOv5l	82.1 GFLOPs	47.1×10⁶	43.9	83.2	91.1	80.7
YOLOv5x	141.8 GFLOPs	89.2×10⁶	71.0	84.0	92.0	81.6

下载: 导出CSV

表 3 不同策略下的定位精度和每帧耗时

Table 3. Positioning accuracy(unit m) and time consuming per frame(unit s) under different strategies

数据集	均方根误差/m		标准偏差/m		每帧耗时/s
数据集	策略1	策略2	策略1	策略2	策略1	策略2
sitting_staic	0.007 9	0.010 3	0.004 3	0.006 7	0.081 3	0.063 1
walking_halfsphere	0.019 6	0.020 4	0.009 8	0.010 7	0.078 1	0.066 4

下载: 导出CSV

表 4 ORB-SLAM3算法与改进算法的绝对轨迹误差对比分析

Table 4. Comparison of ATE analysis between ORB-SLAM3 algorithm and improved algorithm m

序列	均方根误差		标准偏差		平均误差		误差中位数
序列	ORB-SLAM3算法	改进算法	ORB-SLAM3算法	改进算法	ORB-SLAM3算法	改进算法	ORB-SLAM3算法	改进算法
sitting_xyz	0.0092	0.0115	0.0053	0.0075	0.0079	0.0089	0.0073	0.0079
sitting_staic	0.0071	0.0103	0.0035	0.0067	0.0063	0.0072	0.0057	0.0063
walking_rpy	0.3571	0.0302	0.1717	0.0168	0.2983	0.0225	0.2846	0.0203
walking_halfsphere	0.2185	0.0204	0.0879	0.0107	0.1999	0.0174	0.1864	0.0161
walking_xyz	0.2831	0.0143	0.1263	0.0074	0.2534	0.0122	0.2278	0.0107

下载: 导出CSV

表 5 ORB-SLAM3与改进算法的相对姿态误差对比分析

Table 5. Comparison of RPE analysis between ORB-SLAM3 algorithm and improved algorithm m

序列	均方根误差		标准偏差		平均误差		误差中位数
序列	ORB-SLAM3算法	改进算法	ORB-SLAM3算法	改进算法	ORB-SLAM3算法	改进算法	ORB-SLAM3算法	改进算法
sitting_xyz	0.0096	0.0119	0.0061	0.0064	0.0084	0.0100	0.0075	0.0084
sitting_staic	0.0048	0.0074	0.0025	0.0046	0.0041	0.0067	0.0036	0.0059
walking_rpy	0.3741	0.0319	0.3316	0.0219	0.4213	0.0223	0.4113	0.0211
walking_halfsphere	0.1534	0.0143	0.0984	0.0087	0.1175	0.0113	0.1025	0.0105
walking_xyz	0.1497	0.0121	0.1257	0.0071	0.1371	0.0098	0.1224	0.0087

下载: 导出CSV

表 6 改进算法与其他动态SLAM算法的绝对轨迹误差对比分析

Table 6. Comparison of ATE analysis between improved algorithm and other dynamic SLAM algorithms %

序列	DynaSLAM	RTD-SLAM	RDS-SLAM	改进算法
walking_rpy	93.64	92.46	90.68	91.54
walking_halfsphere	88.79	90.13	84.46	90.66
walking_xyz	93.31	95.57	91.68	94.95

下载: 导出CSV

表 7 改进算法和其他算法的跟踪线程耗时和每帧耗时

Table 7. Tracking thread time consumption and time consumption per frame of improved algorithm and other algorithms

算法	网络模型	跟踪线程耗时/ms	每帧消耗时间/ms
ORB-SLAM3			26
DynaSLAM	Mask R-CNN	200	230
RTD-SLAM	YOLOv5s	43	82
改进算法	PP-LCNet-YOLOv5s	31	65

下载: 导出CSV

参考文献(19)

[1]	田野, 陈宏巍, 王法胜, 等. 室内移动机器人的SLAM算法综述[J]. 计算机科学, 2021, 48(9): 223-234. TIAN Y, CHEN H W, WANG F S, et al. Overview of SLAM algorithms for mobile robots[J]. Computer Science, 2021, 48(9): 223-234(in Chinese).
[2]	CAMPOS C, ELVIRA R, RODRÍGUEZ J J G, et al. ORB-SLAM3: an accurate open-source library for visual, visual-inertial, and multimap SLAM[J]. IEEE Transactions on Robotics, 2021, 37(6): 1874-1890. doi: 10.1109/TRO.2021.3075644
[3]	MUR-ARTAL R, TARDÓS J D. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262. doi: 10.1109/TRO.2017.2705103
[4]	GOMEZ-OJEDA R, MORENO F A, ZUÑIGA-NOËL D, et al. PL-SLAM: a stereo SLAM system through the combination of points and line segments[J]. IEEE Transactions on Robotics, 2019, 35(3): 734-746. doi: 10.1109/TRO.2019.2899783
[5]	张有全, 祁宇明, 邓三鹏, 等. 直接法和共视图优化的视觉惯性SLAM系统研究[J]. 自动化与仪器仪表, 2022(5): 197-203. ZHANG Y Q, QI Y M, DENG S P, et al. Research on visual-inertial SLAM system based on direct method and common view optimization[J]. Automation and Instrumentation, 2022(5): 197-203(in Chinese).
[6]	ENGEL J, STÜCKLER J, CREMERS D. Large-scale direct SLAM with stereo cameras[C]//Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE Press, 2015: 1935-1942.
[7]	ENGEL J, KOLTUN V, CREMERS D. Direct sparse odometry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(3): 611-625.
[8]	LI H, JIANG Y H, FAN F Y, et al. RGB-D SLAM based on semantic and geometry for indoor dynamic environments[C]//Proceedings of the China Automation Congress. Piscataway: IEEE Press, 2022: 5581-5586.
[9]	LONG X D, ZHANG W W, ZHAO B. PSPNet-SLAM: a semantic SLAM detect dynamic object by pyramid scene parsing network[J]. IEEE Access, 2020, 8: 214685-214695. doi: 10.1109/ACCESS.2020.3041038
[10]	BESCOS B, FÁCIL J M, CIVERA J, et al. DynaSLAM: tracking, mapping, and inpainting in dynamic scenes[J]. IEEE Robotics and Automation Letters, 2018, 3(4): 4076-4083. doi: 10.1109/LRA.2018.2860039
[11]	HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386-397. doi: 10.1109/TPAMI.2018.2844175
[12]	HAN S Q, XI Z H. Dynamic scene semantics SLAM based on semantic segmentation[J]. IEEE Access, 2020, 8: 43563-43570. doi: 10.1109/ACCESS.2020.2977684
[13]	YUAN X, CHEN S. SaD-SLAM: a visual SLAM based on semantic and depth information[C]//Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE Press, 2020: 4930-4935.
[14]	YU C, LIU Z X, LIU X J, et al. DS-SLAM: a semantic visual SLAM towards dynamic environments[C]//Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE Press, 2018: 1168-1174.
[15]	BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. doi: 10.1109/TPAMI.2016.2644615
[16]	LIU Y B, MIURA J. RDS-SLAM: real-time dynamic SLAM using semantic segmentation methods[J]. IEEE Access, 2021, 9: 23772-23785. doi: 10.1109/ACCESS.2021.3050617
[17]	SU P, LUO S Y, HUANG X C. Real-time dynamic SLAM algorithm based on deep learning[J]. IEEE Access, 2022, 10: 87754-87766. doi: 10.1109/ACCESS.2022.3199350
[18]	ZHANG H, YE M, SUN X. Robust indoor visual-inertial SLAM with pedestrian detection[J]. IEEE Transactions on Instrumentation and Measurement, 2019, 68(10): 3897-3908.
[19]	MUR-ARTAL R, TARDÓS J D. Visual-inertial monocular SLAM with map reuse[J]. IEEE Robotics and Automation Letters, 2017, 2(2): 796-803. doi: 10.1109/LRA.2017.2653359