-
摘要:
针对以露天矿区为代表的非结构化道路场景环境恶劣、道路边界模糊、障碍物尺寸差异较大等问题,提出一种面向非结构化道路的点云语义分割方法,包括预处理、特征提取网络及逆处理3部分。其中,预处理通过坐标转换将三维点云映射到二维Range View (RV)图上,以提高网络推理速度;特征提取网络包括卷积注意力模块及多尺度残差模块,卷积注意力模块用于细化分割边界,解决道路边界模糊问题,多尺度残差模块使用大卷积核扩大感受野并融合上下采样特征,以适应非结构化道路环境下障碍物尺寸变化较大的问题;逆处理通过K最邻近(KNN)算法修正语义标签并将点云映射回三维空间。在典型非结构化道路露天矿区数据集上对所提方法进行测试,平均交并比达到85.1%,推理速度达到6.423 ms,与主流的基于球面投影的语义分割网络相比整体精度提升了3%,此外,所提方法在非结构化道路场景下进行了实际应用。
Abstract:A point cloud semantic segmentation method for unstructured road scenes, represented by open-pit mining areas, is proposed to address issues such as harsh environmental conditions, blurred road boundaries, and significant differences in obstacle sizes. The method includes preprocessing, feature extraction networks, and inverse processing. Among them, preprocessing maps the three-dimensional point cloud to a two-dimensional Range View (RV) graph through coordinate transformation to improve network inference speed; The feature extraction network includes a convolutional attention module and a multi-scale residual module. The convolutional attention module is used to refine the segmentation boundaries and solve the problem of blurred road boundaries; The multi-scale residual module uses a large convolution kernel to expand the receptive field and fuse up and down sampling features to adapt to the problem of large changes in obstacle size in unstructured road environments; Inverse processing uses the K-nearest neighbor (KNN) algorithm to correct semantic labels and map point clouds back to three-dimensional space. The proposed method was tested on a typical unstructured road open-pit mining dataset, with an average intersection to union ratio of 85.1% and an inference speed of 6.423 ms. Compared with mainstream semantic segmentation network based on spherical projection, the overall accuracy was improved by 3%. In addition, the proposed method has been practically applied in unstructured road scenarios.
-
Key words:
- 3D point-cloud /
- semantic segmentation /
- unstructured roads /
- deep learning /
- attention mechanism
-
表 1 数据集类别标签
Table 1. Dataset category labels
类别名 RGB三维通道 汽车 [0, 255, 0] 矿卡 [255, 0, 0] 行人 [0, 255, 255] 道路 [255, 255, 255] 其他 [100, 100, 255] 表 2 各网络精度及推理时间测试表
Table 2. Accuracy and inference time test table of each network
网络结构 类交并比/% 平均精度/% 平均交并比/% 时间/ms 汽车 矿卡 行人 道路 其他 SalsaNext 63.7 65.9 94.7 92.5 93.6 96.3 82.1 6.91 SalsaNext+CM 66.1 64.0 94.9 93.0 93.9 96.5 82.4 6.678 SalsaNext+CAM 64.9 73.0 94.8 93.1 94.1 96.6 84.0 7.238 SalsaNext+RDM+RUM 65.0 72.7 94.8 92.3 93.4 96.3 83.6 6.273 本文方法(SalsaNext+CAM+RDM+RUM) 68.8 74.5 94.8 93.2 94.2 96.7 85.1 6.423 表 3 基于球面投影的网络精度及推理时间测试表
Table 3. Network accuracy and inference time test table based on spherical projection
网络 类交并比/% 平均精度/% 平均交并比/% 时间/ms 汽车 矿卡 行人 道路 其他 RangeNet++ 38.0 36.3 92.4 91.2 92.1 95.4 70.0 15.83 SqueezeSegV2 44.1 39.3 92.9 92.7 93.3 96.1 72.4 49.47 SqueezeSegV3 57.7 56.8 94.8 93.5 94.2 96.7 79.4 50.00 SalsaNet 43.2 47.5 92.1 91.7 92.3 95.6 73.4 7.83 SalsaNext 63.7 65.9 94.7 92.5 93.6 96.3 82.1 6.91 本文方法 68.8 74.5 94.8 93.2 94.2 96.7 85.1 6.42 -
[1] RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]//Proceedings of the International Conference on Medical Image Computing and Computer-assisted Intervention. Berlin: Springer, 2015: 234-241. [2] BADRINARAYANAN V, HANDA A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling[EB/OL]. (2015-01-01)[2023-02-03]. https://arxiv.org/abs/1505.07293v1. [3] POUDEL R P K, LIWICKI S, CIPOLLA R. Fast-SCNN: Fast semantic segmentation network[EB/OL]. (2019-01-01) [2023-02-03]. https://arxiv.org/abs/1902.04502v1. [4] GUO M H, LU C Z, HOU Q B, et al. SegNeXt: Rethinking convolutional attention design for semantic segmentation[EB/OL]. (2022-01-01) [2023-02-03]. https://arxiv.org/abs/2209.08575v1. [5] ZHAO Y, TIAN W, CHENG H. Pyramid Bayesian method for model uncertainty evaluation of semantic segmentation in autonomous driving[J]. Automotive Innovation, 2022, 5(1): 70-78. doi: 10.1007/s42154-021-00165-x [6] 马素刚, 陈期梅, 侯志强, 等. 基于GLCNet的轻量级语义分割算法[J/OL]. 北京航空航天大学学报, (2023-01-14)[2023-02-03]. https://doi.org/10.13700/j.bh.1001-5965.2022.0822.MA S G, CHEN Q M, HOU Z Q, et al. Lightweight semantic segmentation algorithm based on GLCNet[J/OL]. Journal of Beijing University of Aeronautics and Astronautics, (2023-01-14) [2023-02-03]. https://doi.org/10.13700/j.bh.1001-5965.2022.0822 (in Chinese). [7] 杨军, 张琛. 融合双注意力机制和动态图卷积的点云语义分割[J]. 北京航空航天大学学报, 2024, 50(10): 2984-2994.YANG J, ZHANG C. Semantic segmentation of point clouds by fusing dual attention mechanism and dynamic graph convolution[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50(10): 2984-2994(in Chinese). [8] CHARLES R Q, HAO S, MO K C, et al. PointNet: Deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 77-85. [9] QI C R, YI L, SU H, et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space[EB/OL]. (2017-01-01) [2023-02-03]. https://arxiv.org/abs/1706.02413v1. [10] THOMAS H, QI C R, DESCHAUD J E, et al. KPConv: Flexible and deformable convolution for point clouds[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 6410-6419. [11] RETHAGE D, WALD J, STURM J, et al. Fully-convolutional point networks for large-scale point clouds[C]//Proceedings of the European Conference on Computer Vision. Piscataway: IEEE Press, 2018: 596-611. [12] DAI A, RITCHIE D, BOKELOH M, et al. ScanComplete: Large-scale scene completion and semantic segmentation for 3D scans[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4578-4587. [13] GRAHAM B, ENGELCKE M, VAN DER MAATEN L. 3D semantic segmentation with submanifold sparse convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 9224-9232. [14] WU B C, WAN A, YUE X Y, et al. SqueezeSeg: Convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud[C]//Proceedings of the IEEE International Conference on Robotics and Automation. Piscataway: IEEE Press, 2018: 1887-1893. [15] WU B C, ZHOU X Y, ZHAO S C, et al. SqueezeSegV2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a LiDAR point cloud[C]//Proceedings of the International Conference on Robotics and Automation. Piscataway: IEEE Press, 2019: 4376-4382. [16] XU C F, WU B C, WANG Z N, et al. SqueezeSegV3: Spatially-adaptive convolution for efficient point-cloud segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 1-19. [17] MILIOTO A, VIZZO I, BEHLEY J, et al. RangeNet: Fast and accurate LiDAR semantic segmentation[C]//Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE Press, 2019: 4213-4220. [18] AKSOY E E, BACI S, CAVDAR S. SalsaNet: Fast road and vehicle segmentation in LiDAR point clouds for autonomous driving[C]//Proceedings of the IEEE Intelligent Vehicles Symposium. Piscataway: IEEE Press, 2020: 926-932. [19] CORTINHAL T, TZELEPIS G, ERDAL AKSOY E. SalsaNext: Fast, uncertainty-aware semantic segmentation of LiDAR point clouds[C]//Proceedings of the International Symposium on Visual Computing. Berlin: Springer, 2020: 207-222. [20] QIU H B, YU B S, TAO D C. GFNet: Geometric flow network for 3D point cloud semantic segmentation[EB/OL]. (2022-01-01) [2023-02-03]. https://arxiv.org/abs/2207.02605v2. [21] ZHU X,ZHOU H,WANG T,et al. Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021:9934-9943. [22] CHENG R, RAZANI R, TAGHAVI E, et al. (AF)2-S3Net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 12542-12551. [23] SHI W Z, CABALLERO J, HUSZÁR F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 1874-1883. [24] DING X H, GUO Y C, DING G G, et al. ACNet: Strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 1911-1920. [25] BEHLEY J, GARBADE M, MILIOTO A, et al. Semantic KITTI: A dataset for semantic scene understanding of lidar sequences[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 9297-9307. -