基于交叉注意力的多任务交通场景检测模型

牛国臣; 王晓楠

doi:10.13700/j.bh.1001-5965.2022.0610

基于交叉注意力的多任务交通场景检测模型

doi: 10.13700/j.bh.1001-5965.2022.0610

牛国臣^{1, 2, ,},
王晓楠¹

1.
中国民航大学机器人研究所，天津 300300
2.
中国民航大学民航智慧机场理论与系统重点实验室，天津 300300

基金项目: 天津市科技计划项目(17ZXHLGX00120)；天津市研究生科研创新项目(2021YJSO2S30)；中央高校基本科研业务费专项资金(3122022PY17)；中国民航大学研究生科研创新项目(2021YJS025)

详细信息

通讯作者:
E-mail：niu_guochen@139.com

中图分类号: TP391.4
计量
- 文章访问数: 318
- HTML全文浏览量: 73
- PDF下载量: 34
- 被引次数: 0
出版历程
- 收稿日期: 2022-07-12
- 录用日期: 2022-08-14
- 网络出版日期: 2023-02-01
- 整期出版日期: 2024-05-29

A multi-task traffic scene detection model based on cross-attention

NIU Guochen^{1, 2
, ,},
WANG Xiaonan¹

1.
Robotics Institute，Civil Aviation University of China，Tianjin 300300，China
2.
Key Laboratory of Smart Airport Theory and System，Civil Aviation Administration of China，Tianjin 300300，China

Funds: Tianjin Science and Technology Plan Project (17ZXHLGX00120); Tianjin Research Innovation Project for Postgraduate Students (2021YJSO2S30); The Fundamental Research Funds for the Central Universities (3122022PY17); Civil Aviation University of China Postgraduate Research and Innovation Project (2021YJS025)

More Information

Corresponding author: E-mail：niu_guochen@139.com

摘要

摘要:
感知是自动驾驶的基础和关键，但大多数单个模型无法同时完成交通目标、可行驶区域和车道线等多项检测任务。提出一种基于交叉注意力的多任务交通场景检测模型，可以同时检测交通目标、可行驶区域和车道线。使用编解码网络提取初始特征，利用混合空洞卷积对初始特征进行强化，并通过交叉注意力模块得到分割和检测特征图。在分割特征图上进行语义分割，在检测特征图上进行目标检测。实验结果表明：在具有挑战性的BDD100K数据集中，所提模型在任务精度和总体计算效率方面优于其他多任务模型。
- 注意力机制 /
- 多任务学习 /
- 自动驾驶 /
- 目标检测 /
- 混合空洞卷积
Abstract:
Perception is the foundation and key issue to autonomous driving, but most single models can’t simultaneously complete multiple detection tasks such as traffic objects, drivable areas, and lane lines. This paper proposes a multi-task traffic scene detection model based on cross-attention, which can detect traffic objects, drivable areas and lane lines simultaneously. Firstly, the encoder-decoder network is used to extract the initial feature maps. Subsequently, the cross-attention module obtains the segmentation and detection feature maps, and hybrid dilated convolution improves the original feature maps. Finally, the semantic segmentation is performed on the segmentation feature maps and object detection is performed on the detection feature maps. The experimental results demonstrate that, on the difficult BDD100K dataset, our model performs better than existing multi-task models in terms of task-wise accuracy and total computational efficiency.
- attention mechanism /
- multi-task learning /
- autonomous driving /
- object detection /
- hybrid dilated convolution

HTML全文

图 1 多任务交通场景检测结果

Figure 1. Results of multi-task traffic scene detection

下载: 全尺寸图片幻灯片

图 2 SPPF结构

Figure 2. SPPF structure

下载: 全尺寸图片幻灯片

图 3 CA结构

Figure 3. CA structure

下载: 全尺寸图片幻灯片

图 4 网格问题示意图

Figure 4. Illustration of the gridding problem

下载: 全尺寸图片幻灯片

图 5 HDC结构

Figure 5. HDC structure

下载: 全尺寸图片幻灯片

图 6 多任务交通场景检测模型结构

Figure 6. Multi-task traffic scene detection model structure

下载: 全尺寸图片幻灯片

图 7 交通目标检测结果可视化

Figure 7. Visualization of the traffic objects detection results

下载: 全尺寸图片幻灯片

图 8 可行驶区域检测结果可视化

Figure 8. Visualization of the drivable area detection results

下载: 全尺寸图片幻灯片

图 9 车道线检测结果可视化

Figure 9. Visualization of the lane detection results

下载: 全尺寸图片幻灯片

表 1 交通目标检测结果对比

Table 1. Comparison of the traffic object detection results %

模型召回率 mAP50

Faster R-CNN^[5] 77.2 55.6

YOLOv5s 86.8 77.2

MultiNet^[16] 81.3 60.2

DLT-Net^[19] 89.4 68.4

YOLOP^[20] 89.2 76.5

TDL-YOLO 88.6 78.0

下载: 导出CSV

表 2 可行驶区域检测结果对比

Table 2. Comparison of the drivable area detection results %

模型 mIoU

ERFNet^[11] 68.7

PSPNet^[1] 89.6

MultiNet^[16] 71.6

DLT-Net^[19] 72.1

YOLOP^[20] 91.5

TDL-YOLO 91.4

下载: 导出CSV

表 3 车道线检测结果对比

Table 3. Comparison of the lane detection results %

模型精度 IoU

ENet^[22] 34.12 14.64

SCNN^[2] 35.79 15.84

ENet-SAD^[13] 36.56 16.02

YOLOP^[20] 70.50 26.20

TDL-YOLO 72.30 26.50

下载: 导出CSV

表 4 不同光照条件下的检测结果

Table 4. Detection results under different lighting conditions

模型光照 mAP50/% mIoU/% 精度/% IoU/%

YOLOP^[20] 高 77.8 91.7 71.3 26.6
中 77.4 91.0 71.0 26.3
低 73.7 91.2 69.1 25.5
TDL-YOLO 高 79.5 91.5 73.1 26.9
中 79.0 91.4 72.8 26.5
低 75.0 91.2 71.2 26.0

下载: 导出CSV

表 5 消融实验

Table 5. Ablation experiment %

方案召回率 mAP50 mIoU 精度 IoU

检测 88.5 77.0

分割 92.0 74.7 27.0

多任务 88.6 78.0 91.4 72.3 26.5

下载: 导出CSV

参考文献(22)

[1]	ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6230-6239.
[2]	PAN X G, SHI J P, LUO P, et al. Spatial as deep: Spatial CNN for traffic scene understanding[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 12301.
[3]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 580- 587.
[4]	GIRSHICK R. Fast R-CNN[C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1440-1448.
[5]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031
[6]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.
[7]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 779-788.
[8]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2980-2988.
[9]	TIAN Z, SHEN C H, CHEN H, et al. FCOS: Fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 9627-9636.
[10]	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3431-3440.
[11]	ROMERA E, ÁLVAREZ J M, BERGASA L M, et al. ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(1): 263-272. doi: 10.1109/TITS.2017.2750080
[12]	YU C Q, WANG J B, PENG C, et al. BiSeNet: Bilateral segmentation network for real-time semantic segmentation[C]// Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 325-341.
[13]	HOU Y N, MA Z, LIU C X, et al. Learning lightweight lane detection CNNs by self attention distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 1013-1021.
[14]	TABELINI L, BERRIEL R, PAIXAO T M, et al. Keep your eyes on the lane: Real-time attention-guided lane detection[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 294-302.
[15]	QIN Z Q, WANG H Y, LI X. Ultra fast structure-aware deep lane detection[C]//European Conference on Computer Vision. Berlin: Springer, 2020: 276-291.
[16]	TEICHMANN M, WEBER M, ZOELLNER M, et al. MultiNet: Real-time joint semantic reasoning for autonomous driving[C]//Proceedings of the IEEE Intelligent Vehicles Symposium. Piscataway: IEEE Press, 2018: 1013-1020.
[17]	刘占文, 范颂华, 齐明远, 等. 基于时序融合的自动驾驶多任务感知算法[J]. 交通运输工程学报, 2021, 21(4): 223-234. LIU Z W, FAN S H, QI M Y, et al. Multi-task perception algorithm of autonomous driving based on temporal fusion[J]. Journal of Traffic and Transportation Engineering, 2021, 21(4): 223-234(in Chinese).
[18]	刘军, 陈岚磊, 李汉冰. 基于类人视觉的多任务交通目标实时检测模型[J]. 汽车工程, 2021, 43(1): 50-58. LIU J, CHEN L L, LI H B. A real-time detection model for multi-task traffic objects based on humanoid vision[J]. Automotive Engineering, 2021, 43(1): 50-58(in Chinese).
[19]	QIAN Y Q, DOLAN J M, YANG M. DLT-Net: Joint detection of drivable areas, lane lines, and traffic objects[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21(11): 4670-4679.
[20]	WU D, LIAO M W, ZHANG W T, et al. YOLOP: You only look once for panoptic driving perception[EB/OL]. (2021-08-25) [2022-06-25]. https://arxiv.org/abs/2108.11250.
[21]	YU F, CHEN H F, WANG X, et al. BDD100K: A diverse driving dataset for heterogeneous multitask learning[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 2633-2642.
[22]	PASZKE A, CHAURASIA A, KIM S, et al. ENet: A deep neural network architecture for real-time semantic segmentation[EB/OL]. (2016-06-30)[2022-06-25]. https://arxiv.org/abs/1606.02147.

施引文献

资源附件(0)

访问统计

点击查看大图

图(9) / 表(5)

计量

文章访问数: 318
HTML全文浏览量: 73
PDF下载量: 34
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于交叉注意力的多任务交通场景检测模型

doi: 10.13700/j.bh.1001-5965.2022.0610

通讯作者:
E-mail：niu_guochen@139.com

计量

A multi-task traffic scene detection model based on cross-attention

Corresponding author: E-mail：niu_guochen@139.com

计量

目录

模型	召回率	mAP50
Faster R-CNN^[5]	77.2	55.6
YOLOv5s	86.8	77.2
MultiNet^[16]	81.3	60.2
DLT-Net^[19]	89.4	68.4
YOLOP^[20]	89.2	76.5
TDL-YOLO	88.6	78.0

模型	mIoU
ERFNet^[11]	68.7
PSPNet^[1]	89.6
MultiNet^[16]	71.6
DLT-Net^[19]	72.1
YOLOP^[20]	91.5
TDL-YOLO	91.4

模型	精度	IoU
ENet^[22]	34.12	14.64
SCNN^[2]	35.79	15.84
ENet-SAD^[13]	36.56	16.02
YOLOP^[20]	70.50	26.20
TDL-YOLO	72.30	26.50

模型	光照	mAP50/%	mIoU/%	精度/%	IoU/%
YOLOP^[20]	高	77.8	91.7	71.3	26.6
	中	77.4	91.0	71.0	26.3
	低	73.7	91.2	69.1	25.5
TDL-YOLO	高	79.5	91.5	73.1	26.9
	中	79.0	91.4	72.8	26.5
	低	75.0	91.2	71.2	26.0

方案	召回率	mAP50	mIoU	精度	IoU
检测	88.5	77.0
分割			92.0	74.7	27.0
多任务	88.6	78.0	91.4	72.3	26.5

留言板

基于交叉注意力的多任务交通场景检测模型

doi: 10.13700/j.bh.1001-5965.2022.0610

通讯作者: E-mail：niu_guochen@139.com

计量

出版历程

A multi-task traffic scene detection model based on cross-attention

Corresponding author: E-mail：niu_guochen@139.com

计量

出版历程

目录

通讯作者:
E-mail：niu_guochen@139.com