动态环境下基于语义分割与几何约束的视觉/惯性导航方法

张文珂; 韩鹏; 冯宇; 高东

doi:10.13700/j.bh.1001-5965.2024.0016

动态环境下基于语义分割与几何约束的视觉/惯性导航方法

doi: 10.13700/j.bh.1001-5965.2024.0016

张文珂^{1, 2},
韩鹏^1, ,,
冯宇¹,
高东¹

1.
中国科学院国家空间科学中心复杂航天系统电子信息技术重点实验室，北京 100190
2.
中国科学院大学计算科学与技术学院，北京 100049

基金项目:

中国科学院基金(8091A100113)

详细信息

通讯作者:
E-mail：hanpeng@nssc.ac.cn

中图分类号: TP242.6
计量
- 文章访问数: 417
- HTML全文浏览量: 124
- PDF下载量: 13
- 被引次数: 0
出版历程
- 收稿日期: 2024-01-10
- 录用日期: 2024-02-23
- 网络出版日期: 2024-03-18
- 整期出版日期: 2026-04-30

Visual-inertial navigation method based on semantic segmentation and geometric constraints in dynamic environment

ZHANG Wenke^{1, 2},
HAN Peng^{1
, ,},
FENG Yu¹,
GAO Dong¹

1.
Key Laboratory of Electronics and Information Technology for Space Systems，National Space Science Center，Chinese Academy of Sciences，Beijing 100190，China
2.
School of Computer Science and Technology，University of Chinese Academy of Sciences，Beijing 100049，China

Funds:

Science Foundation of the Chinese Academy of Sciences (8091A100113)

More Information

Corresponding author: E-mail：hanpeng@nssc.ac.cn

摘要

摘要:
在实际同时定位与地图构建(SLAM)应用场景中，针对大量运动物体的成像特征点参与特征追踪进而降低算法精度和鲁棒性的问题，以及传统的以剔除动态特征为策略的动态SLAM方案中存在剩余静态特征不足而影响SLAM效果的问题，提出一种基于语义分割与几何约束的动态视觉/惯性组合导航方法。使用语义分割网络及不同类别物体的动态性信任程度生成先验动态掩码，并利用改进的抑制先验动态特征方法提取特征点，进而利用惯性测量单元(IMU)预积分分量结合几何约束技术判断特征点的真实动态性，并制定特征点剔除策略进行剔除，最终使用剩余静态特征点进行追踪与定位。相比于ORB-SLAM3，所提算法在室内动态场景数据集TUM下，定位精度平均提升了73.05%，在室外动态场景数据集KITTI下，定位精度平均提升了19.85%；与传统的动态SLAM算法对比，精度也更优。
- 动态场景 /
- 同时定位与地图构建 /
- 语义信息 /
- 特征提取 /
- 几何约束
Abstract:
In the actual simultaneous localization and mapping (SLAM) application scenario, in order to solve the problem that a large number of imaging feature points of moving objects participate in feature tracking, which reduces the accuracy and robustness of the algorithm, as well as the problem that the traditional dynamic SLAM scheme with the strategy of eliminating dynamic features has insufficient residual static features and affects the SLAM effect, a dynamic vision-inertial integrated navigation method based on semantic segmentation and geometric constraints is proposed. A priori dynamic masks are created using the semantic segmentation network and the dynamic trust degree of various object types. Feature points are then extracted using an improved method of suppressing prior dynamic features. The real dynamic of feature points is then assessed using inertial measurement unit (IMU) pre-integration in conjunction with geometric constraint technology, and a feature point elimination strategy is developed for elimination. Finally, the remaining static feature points are used for tracking and positioning. Compared with the ORB-SLAM3, the positioning accuracy of the algorithm is improved by 73.05% on average in the indoor dynamic scene dataset TUM, and 19.85% in the outdoor dynamic scene dataset KITTI. Additionally, the accuracy is higher than that of the conventional dynamic SLAM approach.
- dynamic scenes /
- simultaneous localization and mapping /
- semantic information /
- feature extraction /
- geometric constraints

HTML全文

图 1 基于语义分割与几何约束的动态SLAM算法整体框架

Figure 1. Overall framework of dynamic SLAM algorithm based on semantic segmentation and geometric constraints

下载: 全尺寸图片幻灯片

图 2 KITTI数据集测试Multi YOLOv5s同时目标检测与语义分割叠加效果

Figure 2. The KITTI dataset evaluation of Multi YOLOv5s for simultaneous object detection and semantic segmentation overlay effects

下载: 全尺寸图片幻灯片

图 3 改进特征提取方法每层金字塔上特征点提取流程

Figure 3. Improved feature extraction method and feature point extraction process on each pyramid

下载: 全尺寸图片幻灯片

图 4 不同稀释倍数下ATE的RMSE和平均每帧剔除特征点数目对比

Figure 4. Comparison of the RMSE of the ATE and the average number of feature points culled per frame under different dilution times

下载: 全尺寸图片幻灯片

图 5 特征点真实动态性判断方法整体框架

Figure 5. The overall framework of the method for judging the real dynamics of feature points

下载: 全尺寸图片幻灯片

图 6 3D距离误差判断特征点真实动态性示意图

Figure 6. 3D distance error judgment feature point real dynamic schematic diagram

下载: 全尺寸图片幻灯片

图 7 TUM数据集下改进算法运行效果对比

Figure 7. Comparison of the running effect of the improved algorithm under the TUM dataset

下载: 全尺寸图片幻灯片

图 8 TUM数据集测试各算法估计轨迹与真实轨迹对比

Figure 8. Comparison of the estimated trajectories of various algorithm with the real trajectories on the TUM dataset

下载: 全尺寸图片幻灯片

图 9 01、04序列下本文改进算法运行效果与原图对比

Figure 9. Comparison between the improved algorithm running effect and the original image under 01 and 04 sequences

下载: 全尺寸图片幻灯片

图 10 07、09序列下本文改进算法运行效果

Figure 10. Running effect of improved algorithm under 07 and09 sequences

下载: 全尺寸图片幻灯片

表 1 TUM数据集测试各算法的绝对轨迹误差对比

Table 1. Comparison of absolute trajectory errors of various algorithm on TUM dataset

序列	STD/m			RMSE/m			RMSE 提升率1/%	RMSE 提升率2/%
序列	ORB-SLAM3 (RGB-D模式)	仅使用语义分割网络的改进算法	改进特征提取方法的改进算法	ORB-SLAM3 (RGB-D模式)	仅使用语义分割网络的改进算法	改进特征提取方法的改进算法	RMSE 提升率1/%	RMSE 提升率2/%
walking_xyz	0.1146	0.0099	0.0083	0.2704	0.0223	0.0204	8.52	92.46
walking_rpy	0.0785	0.0350	0.0287	0.1608	0.0541	0.0487	9.98	69.71
walking_halfsphere	0.1658	0.0406	0.0331	0.2913	0.0548	0.0467	14.78	83.97
walking_static	0.0086	0.0033	0.0032	0.0152	0.0088	0.0082	6.82	46.05

下载: 导出CSV

表 2 TUM数据集测试各算法的相对位姿误差对比

Table 2. Comparison of relative pose errors of various algorithm on TUM dataset

序列	STD/m			RMSE/m			RMSE 提升率1/%	RMSE 提升率2/%
序列	ORB-SLAM3 (RGB-D模式)	仅使用语义分割网络的改进算法	改进特征提取方法的改进算法	ORB-SLAM3 (RGB-D模式)	仅使用语义分割网络的改进算法	改进特征提取方法的改进算法	RMSE 提升率1/%	RMSE 提升率2/%
walking_xyz	0.0089	0.0091	0.0088	0.0170	0.0156	0.0153	1.92	10.00
walking_rpy	0.0303	0.0271	0.0230	0.0357	0.0341	0.0301	11.73	15.69
walking_halfsphere	0.0177	0.0151	0.0102	0.0240	0.0219	0.0188	14.16	21.67
walking_static	0.0125	0.0039	0.0037	0.0152	0.0074	0.0072	2.70	52.63

下载: 导出CSV

表 3 TUM数据集下本文改进算法与其他动态SLAM算法的ATE的RMSE对比

Table 3. RMSE comparison of ATE of the improved algorithm and other dynamic SLAM algorithms on TUM dataset m

序列	ORB-SLAM3^[2]	DS-SLAM^[4]	Detect-SLAM^[6]	ReFusion^[17]	D³FlowSLAM^[18]	本文改进算法
walking_xyz	0.2704	0.0247	0.0241	0.099	0.018	0.0204
walking_rpy	0.1608	0.4442	0.2959		0.057	0.0487
walking_halfsphere	0.2913	0.0303	0.0514	0.104	0.42	0.0467
walking_static	0.0152	0.0081		0.017	0.007	0.0082
注：加粗数字表示性能最优。

下载: 导出CSV

表 4 KITTI数据集测试各算法的绝对轨迹误差对比

Table 4. Comparison of absolute trajectory errors of various algorithms tested on the KITTI dataset

序列	STD/m			RMSE/m			RMSE 提升率1/%	RMSE 提升率2/%
序列	ORB-SLAM3 (双目视觉/ 惯性模式)	未使用几何方法的动态SLAM方案	基于语义分割与几何约束的动态SLAM算法	ORB-SLAM3 (双目视觉/ 惯性模式)	未使用几何方法的动态SLAM方案	基于语义分割与几何约束的动态SLAM算法	RMSE 提升率1/%	RMSE 提升率2/%
01	1.12	1.1455	0.9417	2.6673	2.5815	1.9759	23.46	25.92
04	0.1073	0.0558	0.0534	0.2283	0.1503	0.1469	2.26	35.65
07	0.5029	0.4991	0.4583	0.8926	0.9244	0.8583	7.15	3.84
09	0.5943	0.9086	0.5362	1.3086	1.8222	1.1254	38.24	14.00

下载: 导出CSV

表 5 KITTI数据集测试各算法的相对位姿误差对比

Table 5. Comparison of relative pose errors of various algorithms tested on KITTI dataset

序列	STD/m			RMSE/m			RMSE 提升率1/%	RMSE 提升率2/%
序列	ORB-SLAM3 (双目视觉/ 惯性模式)	未使用几何方法的动态SLAM方案	基于语义分割与几何约束的动态SLAM算法	ORB-SLAM3 (双目视觉/ 惯性模式)	未使用几何方法的动态SLAM方案	基于语义分割与几何约束的动态SLAM算法	RMSE 提升率1/%	RMSE 提升率2/%
01	0.0429	0.03	0.0242	0.0662	0.051	0.0457	10.39	30.97
04	0.0119	0.0115	0.0076	0.0224	0.0191	0.0156	18.32	30.36
07	0.01	0.0095	0.0085	0.0194	0.0176	0.0158	10.23	18.56
09	0.0114	0.0092	0.0091	0.0223	0.0198	0.0191	3.54	14.35

下载: 导出CSV

表 6 KITTI数据集下本文改进算法与其他动态SLAM算法的ATE的RMSE对比

Table 6. RMSE comparison of ATE of the improved algorithm and other dynamic SLAM algorithms under KITTI dataset

序列	ATE的RMSE/m			相对基础算法提升率/%
序列	DynaSLAM^[20]	Dynamic-SLAM^[21]	本文改进算法	DynaSLAM^[20]	Dynamic-SLAM^[21]	本文改进算法
01	9.4		1.9759	9.62		25.92
04	0.2	1.109	0.1469	0	10.06	35.65
07	0.5	1.823	0.8583	0	7.08	3.84
09	1.6	9.285	1.1254	50	10.50	14.00
注：加粗数字表示性能最优。

下载: 导出CSV

参考文献(21)

[1]	刘哲, 史殿习, 杨绍武, 等. 视觉惯性导航系统初始化方法综述[J]. 国防科技大学学报, 2023, 45(2): 15-26. LIU Z, SHI D X, YANG S W, et al. Review of visual-inertial navigation system initialization method[J]. Journal of National University of Defense Technology, 2023, 45(2): 15-26(in Chinese).
[2]	CAMPOS C, ELVIRA R, RODRÍGUEZ J J G, et al. ORB-SLAM3: an accurate open-source library for visual, visual-inertial, and multimap SLAM[J]. IEEE Transactions on Robotics, 2021, 37(6): 1874-1890.
[3]	QIN T, CAO S Z, PAN J, et al. A general optimization-based framework for global pose estimation with multiple sensors[EB/OL]. (2019-01-11)[2024-01-05]. https://arxiv.org/abs/1901.03642.
[4]	YU C, LIU Z X, LIU X J, et al. DS-SLAM: a semantic visual SLAM towards dynamic environments[C]//Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE Press, 2019: 1168-1174.
[5]	BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[6]	ZHONG F W, WANG S, ZHANG Z Q, et al. Detect-SLAM: making object detection and SLAM mutually beneficial[C]//Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2018: 1001-1010.
[7]	ZHANG J, HENEIN M, MAHONY R, et al. VDO-SLAM: a visual dynamic object-aware SLAM system[EB/OL]. (2021-12-14)[2024-01-05]. https://arxiv.org/abs/2005.11052.
[8]	DAI W C, ZHANG Y, LI P, et al. RGB-D SLAM in dynamic environments using point correlations[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(1): 373-389.
[9]	CHENG J H, WANG Z, ZHOU H Y, et al. DM-SLAM: a feature-based SLAM system for rigid dynamic scenes[J]. ISPRS International Journal of Geo-Information, 2020, 9(4): 202.
[10]	蒋畅江, 刘朋, 舒鹏. 基于改进YOLOv5s的动态视觉SLAM算法[J]. 北京航空航天大学学报, 2025, 51(3): 763-771. JIANG C J, LIU P, SHU P. Dynamic visual SLAM algorithm based on improved YOLOv5s[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(3): 763-771(in Chinese).
[11]	YUAN X, CHEN S. SaD-SLAM: a visual SLAM based on semantic and depth information[C]//Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE Press, 2021: 4930-4935.
[12]	LI A, WANG J K, XU M, et al. DP-SLAM: a visual SLAM with moving probability towards dynamic environments[J]. Information Sciences, 2021, 556: 128-142.
[13]	STURM J, ENGELHARD N, ENDRES F, et al. A benchmark for the evaluation of RGB-D SLAM systems[C]//Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE Press, 2012: 573-580.
[14]	YU C Q, WANG J B, PENG C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 334-349.
[15]	GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: the KITTI dataset[J]. The International Journal of Robotics Research, 2013, 32(11): 1231-1237.
[16]	GALVEZ-LÓPEZ D, TARDOS J D. Bags of binary words for fast place recognition in image sequences[J]. IEEE Transactions on Robotics, 2012, 28(5): 1188-1197.
[17]	PALAZZOLO E, BEHLEY J, LOTTES P, et al. ReFusion: 3D reconstruction in dynamic environments for RGB-D cameras exploiting residuals[C]//Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE Press, 2020: 7855-7862.
[18]	YU X Y, YE W C, GUO X Y, et al. D³FlowSLAM: self-supervised dynamic SLAM with flow motion decomposition and DINO guidance[EB/OL]. (2024-08-21)[2026-01-07]. https://arxiv.org/abs/2207.08794.
[19]	MUR-ARTAL R, TARDÓS J D. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262.
[20]	BESCOS B, FÁCIL J M, CIVERA J, et al. DynaSLAM: tracking, mapping, and inpainting in dynamic scenes[J]. IEEE Robotics and Automation Letters, 2018, 3(4): 4076-4083.
[21]	XIAO L H, WANG J G, QIU X S, et al. Dynamic-SLAM: semantic monocular visual localization and mapping based on deep learning in dynamic environment[J]. Robotics and Autonomous Systems, 2019, 117: 1-16.