基于感知增强与多尺度特征融合的小目标车辆检测

沈瑜; 李阳阳; 李博昊; 高宝渠; 魏子易; 白珊

doi:10.13700/j.bh.1001-5965.2024.0124

基于感知增强与多尺度特征融合的小目标车辆检测

doi: 10.13700/j.bh.1001-5965.2024.0124

兰州交通大学电子与信息工程学院，兰州 730070

基金项目:

国家自然科学基金(61861025,62241106)

详细信息

通讯作者:
E-mail：18609311366@163.com

中图分类号: V221⁺.3；TB553
计量
- 文章访问数: 355
- HTML全文浏览量: 120
- PDF下载量: 56
- 被引次数: 0
出版历程
- 收稿日期: 2024-03-05
- 录用日期: 2024-06-21
- 网络出版日期: 2024-07-19
- 整期出版日期: 2026-05-26

Small target vehicle detection based on perceptual enhancement and multi scale feature fusion

School of Electronic and Information Engineering，Lanzhou Jiaotong University，Lanzhou 730070，China

Funds:

National Natural Science Foundation of China (61861025,62241106)

More Information

Corresponding author: E-mail：18609311366@163.com

摘要

摘要:
为解决车辆小目标携带信息匮乏，特征表达能力弱，导致现有算法检测精度低、漏检等问题，提出一种基于感知增强与多尺度特征融合的小目标车辆检测算法。设计空间局部特征感知增强主干网络SLFPB-ST，解决特征提取过程中小目标特征信息丢失严重的问题；提出一种多尺度特征融合网络(MSIFN)，通过分配权重，关注更多的细节信息，同时，在MSIFN中加入大目标抑制块(LRB)约束大目标特征，保留小目标特征表达；采用无锚框机制减少小目标漏检问题，提高检测精度。在UA-DETRAC数据集和Vehicle数据集上的实验结果表明：与Swin Transformer算法相比，所提算法在mAP、AP50和AP75指标上分别提升5.15%、9.35%和4.35%，参数量增加14 MB，检测速度降低0.4 帧/s，验证了算法具有良好的鲁棒性和适用性。
- 小目标车辆检测 /
- Swin Transformer /
- 多尺度特征融合 /
- 空洞卷积 /
- GIOU
Abstract:
This paper suggests a perception-enhanced and multi-scale fusion-based algorithm for small target vehicle detection in order to address the problem of inadequate information and weak feature expression ability of small targets carried by vehicles, which leads to low detection accuracy and missed detections in current algorithms. Firstly, a spatial local feature enhancement backbone network SLFPB-ST is designed to solve the problem of severe loss of feature information for small targets during the feature extraction process. Secondly, a multi-scale information fusion network (MSIFN) is proposed to fuse features at multiple scales by allocating weights to focus on more detailed information. In order to restrict the characteristics of large objects while maintaining the feature representation of small targets, MSIFN also has a large objects restriction block (LRB). Finally, an anchor-free mechanism is adopted to reduce missed detections of small targets and improve detection accuracy. Experimental results on the UA-DETRAC dataset and Vehicle dataset demonstrate that compared with the Swin Transformer algorithm, our algorithm achieves an improvement of 5.15% in mAP, 9.35% in AP50, and 4.35% in AP75 with an increase in parameter size by 14 MB and a decrease in detection speed by 0.4 frames per second. This validates that our algorithm exhibits good robustness and applicability.
- small target vehicle detection /
- Swin Transformer /
- multi-scale feature fusion /
- dilated convolution /
- GIOU

HTML全文

图 1 本文算法框架

Figure 1. Framework of the proposed algorithm

下载: 全尺寸图片幻灯片

图 2 Swin Transformer网络结构

Figure 2. Network framework of Swin Transformer

下载: 全尺寸图片幻灯片

图 3 Swin Transformer Block网络结构

Figure 3. Network framework of Swin Transformer Block

下载: 全尺寸图片幻灯片

图 4 SLFPB-ST主干网络

Figure 4. SLFPB-ST backbone network

下载: 全尺寸图片幻灯片

图 5 空间局部特征感知块

Figure 5. Spatial local feature perception block

下载: 全尺寸图片幻灯片

图 6 多尺度特征融合网络

Figure 6. Multi-scale information fusion network

下载: 全尺寸图片幻灯片

图 7 RFA和ASF的过程

Figure 7. Process of RFA and ASF

下载: 全尺寸图片幻灯片

图 8 大目标抑制块

CAP：通道平均池化；σ：Sigmoid；CMP：通道最大池化；$ \otimes $：矩阵乘积。

Figure 8. Large objects restraint block

下载: 全尺寸图片幻灯片

图 9 总损失曲线

Figure 9. Total loss curve

下载: 全尺寸图片幻灯片

图 10 数据集部分样本示例

Figure 10. Examples of partial samples of dataset

下载: 全尺寸图片幻灯片

图 11 本文算法与原始算法各类别检测精度对比

Figure 11. Comparison of detection accuracy of each category between the proposed algorithm and original algorithm

下载: 全尺寸图片幻灯片

图 12 Vehicle数据集小目标检测结果对比

Figure 12. Comparison of small target detection results in Vehicle dataset

下载: 全尺寸图片幻灯片

图 13 UA-DETRAC数据集遮挡目标检测结果对比

Figure 13. Comparison of occlusion target detection results for UA-DETRAC dataset

下载: 全尺寸图片幻灯片

图 14 不同光照情况下检测结果对比

Figure 14. Comparison of detection results under different light conditions

下载: 全尺寸图片幻灯片

图 15 可视化热力图对比

Figure 15. Comparison of visual thermal maps

下载: 全尺寸图片幻灯片

表 1 本文算法与主流算法检测结果比较

Table 1. Comparison of detection results between the proposed algorithm and mainstream algorithms

算法	主干网络	mAP/%	AP50/%	AP75/%	AP（S）/%	AP（M）/%	AP（L）/%	参数量/MB	检测速度/(帧·s⁻¹)
Faster R-CNN^[5]	ResNet101	51.85	73.75	56.27	33.28	54.84	64.18	243.6	26.8
SSD^[7]	VGG16	49.35	66.42	48.35	28.24	51.52	60.80	92.1	52.4
RetinaNet^[8]	ResNet50	54.85	77.15	59.16	36.65	59.29	62.21	96.6	46.8
CenterNet^[9]	DLA-34	55.65	79.04	59.24	36.12	58.14	66.10	88.8	45.5
YOLOv4^[11]	Darknet53	56.85	81.24	62.12	36.55	60.10	66.90	198.5	30.6
DETR^[14]	ResNet-50	48.74	68.22	47.20	27.56	49.50	58.85	142	35.7
原始算法^[15]	Swin-T	53.45	74.85	56.45	34.64	55.40	64.65	114	42.8
本文	SLFPB-ST	58.60	84.20	60.80	37.15	57.60	67.56	128	42.4

下载: 导出CSV

表 2 不同算法结果比较

Table 2. Comparison of results of different algorithms

算法	mAP/%	mAP提升/%	AP（S）/%	AP（S）提升/%	参数量/MB	参数量降低/MB	检测速度/(帧·s⁻¹)	检测速度提升/(帧·s⁻¹)
文献[16]	52.23	1.85	33.50	1.64	278	−36	24.6	1.2
文献[17]	56.40	0.70	26.64	0.65	124	14	42.0	4.5
文献[18]	51.65	3.97	29.28	1.75	198	−46	38.8	−9.1
本文	58.60	5.15	37.15	2.51	128	−14	42.4	−0.4

下载: 导出CSV

表 3 本文算法与原始算法的性能指标比较

Table 3. Performance comparison between the proposed algorithm and original algorithm

算法	AP50/%	F₁/%	精确率/%	召回率/%	检测速度/（帧·s⁻¹）
原始算法	74.85	73.60	83.42	72.65	42.8
本文	84.20	85.16	91.30	79.48	42.4

下载: 导出CSV

表 4 消融实验结果

Table 4. Results of ablation experiments

Swin-T	SLFPB	MSIFN	LRB	检测速度/（帧·s⁻¹）	mAP/%
√				35.2	53.45
√	√			38.5	53.95
√		√		36.9	55.12
√	√	√		33.4	56.74
√	√	√	√	42.4	58.60

下载: 导出CSV

参考文献(22)

[1]	XIAO Z Y, ZHANG G B. An attention-based odometry framework for multisensory unmanned ground vehicles (UGVs)[J]. Drones, 2023, 7(12): 699.
[2]	王坤, 项琦鑫. 改进Yolov4的车辆弱目标检测算法[J]. 中国惯性技术学报, 2023, 31(8): 797-805. WANG K, XIANG Q X. Improved Yolov4 algorithm for vehicle weak object detection[J]. Journal of Chinese Inertial Technology, 2023, 31(8): 797-805(in Chinese).
[3]	LIANG H, SEO S. UAV low-altitude remote sensing inspection system using a small target detection network for Helmet wear detection[J]. Remote Sensing, 2023, 15(1): 196.
[4]	GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2016: 1440-1448.
[5]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[6]	CAI Z W, VASCONCELOS N. Cascade R-CNN: high quality object detection and instance segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1483-1498.
[7]	CHEN W P, QIAO Y T, LI Y J. Inception-SSD: an improved single shot detector for vehicle detection[J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13(11): 5047-5053.
[8]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2999-3007.
[9]	DUAN K W, BAI S, XIE L X, et al. CenterNet: keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 6568-6577.
[10]	DING X W, YANG R D. Vehicle and parking space detection based on improved YOLO network model[J]. Journal of Physics: Conference Series, 2019, 1325(1): 012084.
[11]	SUMIT S S, WATADA J, ROY A, et al. In object detection deep learning methods, YOLO shows supremum to Mask R-CNN[J]. Journal of Physics: Conference Series, 2020, 1529(4): 042086.
[12]	SU X Y, LIU H M, TAO L F, et al. An end-to-end framework for remaining useful life prediction of rolling bearing based on feature pre-extraction mechanism and deep adaptive Transformer model[J]. Computers & Industrial Engineering, 2021, 161: 107531.
[13]	KOJIMA T, IWASAWA Y, MATSUO Y. Robustifying vision Transformer without retraining from scratch using attention-based test-time adaptation[J]. New Generation Computing, 2023, 41(1): 5-24.
[14]	QI F, CHEN G M, LIU J Y, et al. End-to-end pest detection on an improved deformable DETR with multihead criss cross attention[J]. Ecological Informatics, 2022, 72: 101902.
[15]	LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2022: 9992-10002.
[16]	XU X. Research on a small target object detection algorithm for electric transmission lines based on convolutional neural network[J]. IAENG International Journal of Computer Science, 2023, 50(2): 375-380.
[17]	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector [EB/OL]. (2017-01-23)[2024-03-01]. https://arxiv.org/abs/1701.06659.
[18]	ZHANG Q, ZHANG H Y, LU X W. Adaptive feature fusion for small object detection[J]. Applied Sciences, 2022, 12(22): 11854.
[19]	CHEN Y K, ZHANG P Z, KONG T, et al. Scale-aware automatic augmentations for object detection with dynamic training[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 2367-2383.
[20]	LIU S T, HUANG D, WANG Y H. Learning spatial fusion for single-shot object detection[EB/OL]. (2019-11-25)[2024-03-01]. https://arxiv.org/abs/1911.09516.
[21]	ZOPH B, CUBUK E D, GHIASI G, et al. Learning data augmentation strategies for object detection[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 566-583.
[22]	REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 658-666.