基于立体图像的多路径特征金字塔网络3D目标检测

苏凯祺; 阎维青; 徐金东

doi:10.13700/j.bh.1001-5965.2021.0525

基于立体图像的多路径特征金字塔网络3D目标检测

doi: 10.13700/j.bh.1001-5965.2021.0525

烟台大学计算机与控制工程学院, 烟台 264005

基金项目:

国家自然科学基金 61801414

国家自然科学基金 62072391

国家自然科学基金 62066013

山东省自然科学基金 ZR2019MF060

山东省高等学校科研计划重点项目 J18KZ016

详细信息

通讯作者:
阎维青, E-mail: wqyan@tju.edu.cn

中图分类号: TP391
计量
- 文章访问数: 361
- HTML全文浏览量: 116
- PDF下载量: 36
- 被引次数: 0
出版历程
- 收稿日期: 2021-09-06
- 录用日期: 2021-09-17
- 网络出版日期: 2021-10-18
- 整期出版日期: 2022-08-20

3D object detection based on multi-path feature pyramid network for stereo images

School of Computer and Control Engineering, Yantai University, Yantai 264005, China

Funds:

National Natural Science Foundation of China 61801414

National Natural Science Foundation of China 62072391

National Natural Science Foundation of China 62066013

Shandong Provincial Natural Science Foundation ZR2019MF060

Shandong Province Higher Educational Science and Technology Key Program J18KZ016

More Information

Corresponding author: YAN Weiqing, E-mail: wqyan@tju.edu.cn

摘要

摘要:
3D目标检测是计算机视觉和自动驾驶中一项重要的场景理解任务。当前基于立体图像的3D目标检测方法大多没有充分考虑多个目标之间的尺度存在较大差异，从而尺度小的物体容易被忽略，导致检测精度低。针对这一问题，提出了一种基于立体图像的多路径特征金字塔网络(MpFPN)3D目标检测方法。MpFPN对特征金字塔网络进行了扩展，增加了自底向上的路径、由上至下的路径及输入特征图到输出特征图之间的连接，为联合区域提议网络提供了更高语义信息和更细粒度空间信息的多尺度特征信息。实验结果表明：在3D目标检测KITTI数据集上，无论在场景简单、中等、复杂情况下，所提方法获得的结果都优于比较方法的结果。
- 3D目标检测 /
- 特征金字塔网络(FPN) /
- 立体图像 /
- 多尺度 /
- 深度学习
Abstract:
3D object detection is an important scene understanding task in computer vision and autonomous driving. However, most of these methods do not fully consider the large differences in scales between multiple objects. Thus, objects with a small scale are easily ignored, resulting in low detection accuracy. To address this problem, this paper proposes a 3D object detection method based on multi-path feature pyramid network (MpFPN) for stereo images. MpFPN extends feature pyramid network, adding a bottom-up path, top-down path, and connections between input and output features. It provides multi-scale feature information with higher semantic information and finer-grained spatial information for union region proposal network. Experimental results show that the proposed method achieves better results than comparative methods in easy, moderate and hard scenarios on the 3D object detection dataset KITTI.
- 3D object detection /
- feature pyramid network (FPN) /
- stereo image /
- multi-scale /
- deep learning

HTML全文

图 1 本文网络的整体框架

Figure 1. Overall framework of proposed network

下载: 全尺寸图片幻灯片

图 2 KITTI验证集上的3D目标检测结果

Figure 2. 3D object detection results on KITTI validation set

下载: 全尺寸图片幻灯片

表 1 KITTI验证集上汽车类别的AP_bev和AP_3D

Table 1. AP_bev/AP_3D of car category on KITTI validation set %

方法	输入	IoU=0.5			IoU=0.7
方法	输入	简单	中等	困难	简单	中等	困难
MonoGRNet^[24]	M	54.21/50.51	39.69/36.97	33.06/30.82	24.97/13.88	19.44/10.19	16.30/7.62
M3D-RPN^[5]	M	55.37/48.96	42.49/39.57	35.29/33.01	25.94/20.27	21.18/17.06	17.90/15.21
AM3D^[25]	M	72.64/68.86	51.82/49.19	44.21/42.24	43.75/32.23	28.39/21.09	23.87/17.26
3DOP^[9]	S	55.04/46.04	41.25/34.63	34.55/30.09	12.63/6.55	9.49/5.07	7.59/4.10
TLNet^[26]	S	62.46/59.51	45.99/43.71	41.92/37.99	29.22/18.15	21.88/14.26	18.83/13.72
Stereo R-CNN^[10]	S	87.13/85.84	74.11/66.28	58.93/57.24	68.50/54.11	48.30/36.69	41.47/31.07
本文方法	S	87.62/86.49	75.04/72.62	59.31/58.04	69.44/55.26	49.36/37.94	42.11/32.38
注：S表示双目图像对作为输入，M表示单目图像作为输入。“/”前数据为AP_bev, “/”后数据为AP_3D。

下载: 导出CSV

表 2 本文方法与Pseudo-LiDAR^[11]方法在KITTI验证集上汽车类别的AP_bev和AP_3D

Table 2. AP_bev and AP_3D of car category on KITTI validation set between the proposed method and seudo-LiDAR^[11] method %

方法	AP_bev (IoU=0.7)			AP_3D(IoU=0.7)
方法	简单	中等	困难	简单	中等	困难
本文方法	69.44	49.36	42.11	55.26	37.94	32.38
PL+FP^[11]	69.7	48.1	41.8	54.9	36.4	31.1

下载: 导出CSV

表 3 在KITTI数据集上对于MpFPN方法的消融实验

Table 3. Ablation experiment of MpFPN approach on KITTI dataset %

Path	Conn	AP_bev (IoU=0.7)			AP_3D(IoU=0.7)
Path	Conn	简单	中等	困难	简单	中等	困难
×	×	65.92	46.11	40	52.25	34.69	30.27
√	×	68.01	48.15	41.21	54.78	36.88	31.42
√	√	69.44	49.36	42.11	55.26	37.94	32.38

下载: 导出CSV

参考文献(28)

[1]	WANG Z, JIA K. Frustum ConvNet: Sliding frustums to aggregate local point-wise features for a modal 3D object detection[C]//Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE Press, 2019: 1742-1749.
[2]	SHI S, WANG X, LI H. PointRCNN: 3D object proposal generate-on and detection from point cloud[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 770-779.
[3]	QI C R, LITANY O, HE K, et al. Deep Hough voting for 3D object detection in point clouds[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 9277-9286.
[4]	SHI S, GUO C, JIANG L, et al. PV-RCNN: Point-voxel feature set abstraction for 3D object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 10529-10538.
[5]	BRAZIL G, LIU X. M3D-RPN: Monocular 3D region proposal network for object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 9287-9296.
[6]	KU J, PON A D, WASLANDER S L. Monocular 3D object detection leveraging accurate proposals and shape reconstruction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 11867-11876.
[7]	CHEN Y, TAI L, SUN K, et al. MonoPair: Monocular 3D object detection using pairwise spatial relationships[C]//Proceedi-ngs of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 12093-12102.
[8]	LIU L, WU C, LU J, et al. Reinforced axial refinement network for monocular 3D object detection[C]//European Conference on Computer Vision. Berlin: Springer, 2020: 540-556.
[9]	CHEN X, KUNDU K, ZHU Y, et al. 3D object proposals usingstereo imagery for accurate object class detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(5): 1259-1272.
[10]	LI P, CHEN X, SHEN S. Stereo R-CNN based 3D object detection for autonomous driving[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7644-7652.
[11]	WANG Y, CHAO W L, GARG D, et al. Pseudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 8445-8453.
[12]	SUN J, CHEN L, XIE Y, et al. Disp R-CNN: Stereo 3D object detection via shape prior guided instance disparity estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 10548-10557.
[13]	GHIASI G, LIN T Y, LE Q V. NAS-FPN: Learning scalable feature pyramid architecture for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7036-7045.
[14]	ZHAO Q, SHENG T, WANG Y, et al. M2Det: A single-shot object detector based on multi-level feature pyramid network[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2019: 9259-9266.
[15]	TAN M, PANG R, LE Q V. EfficientDet: Scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 10781-10790.
[16]	曹帅, 张晓伟, 马健伟. 基于跨尺度特征聚合网络的多尺度行人检测[J]. 北京航空航天大学学报, 2020, 46(9): 1786-1796. doi: 10.13700/j.bh.1001-5965.2020.0069 CAO S, ZHANG X W, MA J W. Transscale feature aggregation network for multiscale pedestrian detection[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(9): 1786-1796(in Chinese). doi: 10.13700/j.bh.1001-5965.2020.0069
[17]	李晓光, 付陈平, 李晓莉, 等. 面向多尺度目标检测的改进Faster R-CNN算法[J]. 计算机辅助设计与图形学学报, 2019, 31(7): 1095-1101. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJF201907005.htm LI X G, FU C P, LI X L, et al. Improved faster R-CNN for multi-scale object detection[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(7): 1095-1101(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JSJF201907005.htm
[18]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 2117-2125.
[19]	LIU S, QI L, QIN H, et al. Path aggregation network for inst-ance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8759-8768.
[20]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
[21]	HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2961-2969.
[22]	DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2009: 248-255.
[23]	GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2012: 3354-3361.
[24]	QIN Z, WANG J, LU Y. MonoGRNet: A geometric reasoning network for monocular 3D object localization[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2019: 8851-8858.
[25]	MA X, WANG Z, LI H, et al. Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 6851-6860.
[26]	QIN Z, WANG J, LU Y. Triangulation learning network: From monocular to stereo 3D object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7615-7623.
[27]	CHANG J R, CHEN Y S. Pyramid stereo matching network[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 5410-5418.
[28]	KU J, MOZIFIAN M, LEE J, et al. Joint 3D proposal generation and object detection from view aggregation[C]//Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE Press, 2018: 1-8.