基于卷积注意力与特征融合的火灾检测算法

田佳麒; 秦国轩; 张为

doi:10.13700/j.bh.1001-5965.2024.0173

基于卷积注意力与特征融合的火灾检测算法

doi: 10.13700/j.bh.1001-5965.2024.0173

天津大学微电子学院，天津 300072

基金项目:

国家重点研发计划(2022YFC3006302)

详细信息

通讯作者:
E-mail：tjuzhangwei@tju.edu.cn

中图分类号: TP391.41
计量
- 文章访问数: 434
- HTML全文浏览量: 159
- PDF下载量: 0
- 被引次数: 0
出版历程
- 收稿日期: 2024-03-26
- 录用日期: 2024-07-30
- 网络出版日期: 2024-08-08
- 整期出版日期: 2026-05-26

Fire-and-smoke detection algorithm based on convolutional attention and feature fusion

School of Microelectronics，Tianjin University，Tianjin 300072，China

Funds:

National Key Research and Development Program of China (2022YFC3006302)

More Information

Corresponding author: E-mail：tjuzhangwei@tju.edu.cn

摘要

摘要:
针对现实场景下火灾检测精度与速度不平衡的情况，提出一种加强空间特征提取和多尺度特征融合的火灾检测算法。对主干网络的高层语义信息提取进行改进，将感受野卷积注意力模块嵌入主干网络中，提升模型的特征提取能力；引入改进后的强特征融合网络，将低层空间信息和高层语义信息进一步加强融合，提升模型精度；利用局部卷积(PConv)模块对主干网络和检测头进行轻量化改进，在不损失精度的前提下，降低模型的参数量和内存访问；调整回归损失函数，提升模型的检测能力。实验结果表明，改进算法在自建的火灾数据集上的0.5阈值下平均精度均值(mAP50)和0.5:0.95阈值下平均精度均值(mAP50:95)分别提高了2.1%和2.9%，证明了所提算法在火灾检测领域的优越性；在Pascal VOC 07+12公开数据集上的mAP50和mAP50:95分别提高了1.4%和2.4%，证明了所提算法具有较强的泛化性能。
- 火灾检测 /
- 多尺度特征 /
- 特征融合 /
- 轻量化检测头 /
- 损失函数
Abstract:
Aiming at the imbalance between the accuracy and speed of fire-and-smoke detection in real-world scenarios, this paper proposes a fire-and-smoke detection algorithm that strengthens spatial feature extraction and multi-scale feature fusion. We improved the extraction of high-level semantic information from the backbone network by embedding the receptive field convolutional attention module into it, enhancing the model’s feature extraction capability. Additionally, we introduced an enhanced strong feature fusion network to further strengthen the fusion of low-level spatial information and high-level semantic information, thereby improving model accuracy. In order to satisfy real-time needs, we additionally used the partial convolution (PConv) module to lightweightly enhance the detection head and backbone network, lowering the model’s parameter count and memory access without compromising accuracy. Furthermore, we adjusted the regression loss function to enhance the model's detection capabilities. According to experimental results, the suggested algorithm improves the mean average precision at IoU threshold 0.5 (mAP50) and 0.5:0.95 (mAP50:95) by 2.1% and 2.9%, respectively, demonstrating its superiority in the field of fire-and-smoke detection. Additionally, the mAP50 and mAP50:95 on the Pascal VOC 07+12 public dataset are increased by 1.4% and 2.4%, respectively, demonstrating the algorithm’s good generalization performance.
- fire-and-smoke detection /
- multi-scale features /
- feature fusion /
- lightweight detection head /
- loss function

HTML全文

图 1 本文算法整体结构

Figure 1. The overall structure of the algorithm

下载: 全尺寸图片幻灯片

图 2 感受野卷积注意力模块

Figure 2. Receptive field convolutional attention module

下载: 全尺寸图片幻灯片

图 3 BottleNeck_RF结构

Figure 3. The structure of BottleNeck_RF

下载: 全尺寸图片幻灯片

图 4 特征收集与分发结构

Figure 4. Gather-distribute structure

下载: 全尺寸图片幻灯片

图 5 相邻层融合模块和信息注入模块结构

Figure 5. Structure of lightweight adjacent layer fusion module and information injection module

下载: 全尺寸图片幻灯片

图 6 EMSC模块和BottleNeck_EMSC结构

Figure 6. EMSC module and BottleNeck_EMSC structure

下载: 全尺寸图片幻灯片

图 7 PConv模块、Faster模块和轻量化检测头结构

Figure 7. PConv module, Faster module and lightweight detection head structure

下载: 全尺寸图片幻灯片

图 8 MPDIOU示意图

Figure 8. Schematic diagram of MPDIOU

下载: 全尺寸图片幻灯片

图 9 数据集部分图像

Figure 9. Partial images of the dataset

下载: 全尺寸图片幻灯片

图 10 目标分布情况

Figure 10. Distribution of targets

下载: 全尺寸图片幻灯片

图 11 消融实验可视化对比

Figure 11. Visualization comparison of ablation experiments

下载: 全尺寸图片幻灯片

图 12 算法检测效果可视化对比

Figure 12. Visualization comparison of the detection effects of algorithm

下载: 全尺寸图片幻灯片

表 1 实验环境配置

Table 1. Experiment environment configuration

实验环境	环境条件
CPU	Intel Xeon Silver 4310
GPU	NVIDIA A40
操作系统	Ubuntu 18.04.6
编程语言	Python3.11.4
深度学习框架	Pytorch2.0.1

下载: 导出CSV

表 2 消融实验比较分析

Table 2. Comparative analysis of ablation experiment

模型	感受野卷积注意力模块	强特征融合网络	PConv	MPDIOU	mAP50/%	mAP50:95/%	参数量	浮点运算速度/10⁹·s⁻¹
标准算法					95.2	69.1	11.1×10⁶	28.6
改进1	√				96.2	70.5	11.2×10⁶	28.8
改进2	√	√			96.7	71.4	15.1×10⁶	33.0
改进3	√	√	√		96.7	71.4	13.5×10⁶	24.2
本文算法	√	√	√	√	97.3	72.0	13.5×10⁶	24.2

下载: 导出CSV

表 3 注意力模块比较结果

Table 3. Attention module comparison results

模型	mAP50/%	mAP50:95/%
标准算法	95.2	69.1
+GAM^[13]	96.0	70.0
+ CBAM^[14]	95.7	69.7
+感受野卷积注意力	96.2	70.5

下载: 导出CSV

表 4 颈部网络比较结果

Table 4. Comparison results of the neck network

模型	mAP50/%	mAP50:95/%
标准算法	95.2	69.1
+文献[7]	96.2	71.2
+强特征融合网络	96.5	71.3

下载: 导出CSV

表 5 损失函数对比实验结果

Table 5. Comparison experiment results of loss function

损失函数	mAP50/%	mAP50:95/%
标准算法(CIOU)	95.2	69.1
+Focal EIOU^[15]	95.5	69.1
+SIoU^[16]	95.8	69.0
+MPDIOU^[11]	96.3	69.7

下载: 导出CSV

表 6 不同算法模型的检测结果对比

Table 6. Comparison of detection results of different algorithm models

算法	mAP50/%	mAP50:95/%	参数量	浮点运算速度/(10⁹·s⁻¹)
Faster R-CNN^[17]	88.2	51.5	41.1×10⁶	91.0
SSD512^[18]	95.7	62.1	24.5×10⁶	87.9
YOLOv3-tiny^[19]	94.4	61.0	8.7×10⁶	13.0
YOLOv5m	96.2	68.8	21.2×10⁶	49.0
YOLOv6s^[20]	96.7	71.0	18.5×10⁶	45.3
YOLOv7-tiny^[21]	96.2	69.4	6.1×10⁶	13.2
YOLOv8s	95.2	69.1	11.1×10⁶	28.6
YOLOv9-T^[22]	96.6	70.9	2.7×10⁶	11.0
Deformable-DETR^[23]	95.7	60.1	40.0×10⁶	173.0
RT-DETR-Res18^[24]	96.1	70.4	20.0×10⁶	60.5
本文算法	97.3	72.0	13.5×10⁶	24.2

下载: 导出CSV

表 7 不同算法模型在公共数据集上的检测结果对比

Table 7. Comparison of detection results of different algorithm models on public datasets

算法	mAP50/%	mAP50:95/%	参数量	浮点运算速度/(10⁹·s⁻¹)
Faster-RCNN^[17]	65.9	36.5	41.2×10⁶	91.1
SSD512^[18]	65.4	37.6	27.2×10⁶	90.4
YOLOv3-tiny^[19]	54.4	27.9	8.7×10⁶	13.1
YOLOv5m	77.5	53.5	21.2×10⁶	49.1
YOLOv6s^[20]	75.8	53.5	18.5×10⁶	45.4
YOLOv7-tiny^[21]	70.1	43.4	6.1×10⁶	13.3
YOLOX-s^[25]	72.3	44.7	8.9×10⁶	26.7
YOLOv8s	76.6	55.4	11.1×10⁶	28.7
YOLOv9-T^[22]	71.8	52.2	2.7×10⁶	11.1
Deformable-DETR^[23]	77.5	52.7	40.0×10⁶	173.0
RT-DETR-Res18^[24]	72.4	53.1	20.0×10⁶	60.5
本文算法	78.0	57.8	13.5×10⁶	24.3

下载: 导出CSV

参考文献(25)

[1]	SHARMA J, GRANMO O C, GOODWIN M, et al. Deep convolutional neural networks for fire detection in images[C]//Proceedings of the Engineering Applications of Neural Networks. Berlin: Springer, 2017: 183-193.
[2]	HOSSEINI A, HASHEMZADEH M, FARAJZADEH N. UFS-Net: a unified flame and smoke detection method for early detection of fire in video surveillance applications using CNNs[J]. Journal of Computational Science, 2022, 61: 101638.
[3]	张融, 张为. 基于改进GhostNet-FCOS的火灾检测算法[J]. 浙江大学学报(工学版), 2022, 56(10): 1891-1899. ZHANG R, ZHANG W. Fire detection algorithm based on improved GhostNet-FCOS[J]. Journal of Zhejiang University (Engineering Science), 2022, 56(10): 1891-1899(in Chinese).
[4]	秦瑞, 张为. 一种无锚框结构的多尺度火灾检测算法[J]. 西安电子科技大学学报(自然科学版), 2022, 49(6): 111-119. QIN R, ZHANG W. Multi-scale fire detection algorithm with an anchor free structure[J]. Journal of Xidian University (Natural Science) and Technology, 2022, 49(6): 111-119(in Chinese).
[5]	ZHANG X, LIU C, YANG D G, et al. RFAconv: innovating spatial attention and standard convolutional operation[EB/OL]. (2023-04-06)[2024-01-09]. https://doi.org/10.48550/arXiv.2304.03198.
[6]	WANG K X, LIEW J H, ZOU Y T, et al. PANet: few-shot image semantic segmentation with prototype alignment[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 9196-9205.
[7]	WANG C C, HE W, NIE Y, et al. Gold-YOLO: efficient object detector via gather-and-distribute mechanism[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. New York: ACM, 2023: 51094-51112.
[8]	DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style ConvNets great again[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13728-13737.
[9]	WAN Q, HUANG Z L, LU J C, et al. Seaformer: squeeze-enhanced axial transformer for mobile visual recognition[EB/OL]. (2023-01-30)[2024-01-11]. https://arxiv.org/abs/2301.13156.
[10]	ZHANG H, ZU K K, LU J, et al. EPSANet: an efficient pyramid squeeze attention block on convolutional neural network[C]//Proceedings of the Computer Vision –ACCV 2022. Cham: Springer, 2023: 541-557.
[11]	CHEN J R, KAO S H, HE H, et al. Run, don’t walk: chasing higher FLOPS for faster neural networks[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2023: 12021-12031.
[12]	MA S L, XU Y. MPDIoU: a loss for efficient and accurate bounding box regression[EB/OL]. (2023-07-14)[2024-04-15]. https://arxiv.org/abs/2307.07662.
[13]	LIU Y C, SHAO Z R, HOFFMANN N. Global attention mechanism: retain information to enhance channel-spatial interactions[EB/OL]. (2024-12-10)[2024-01-26]. https://arxiv.org/abs/2112.05561.
[14]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the Computer Vision–ECCV 2018. Cham: Springer, 2018: 3-19.
[15]	ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157.
[16]	GEVORGYAN Z. SIoU loss: more powerful learning for bounding box regression[EB/OL]. (2022-05-25)[2024-01-30]. https://arxiv.org/abs/2205.12740.
[17]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence. Piscataway: IEEE Press, 2016: 1137-1149.
[18]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the Computer Vision–ECCV 2016. Cham: Springer, 2016: 21-37.
[19]	ADARSH P, RATHI P, KUMAR M. YOLOv3-tiny: object detection and recognition using one stage improved model[C]//Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems. Piscataway: IEEE Press, 2020: 687-694.
[20]	LI C Y, LI L L, JIANG H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL]. (2022-09-07)[2024-02-03]. https://arxiv.org/abs/2209.02976.
[21]	WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2023: 7464-7475.
[22]	WANG C Y, YEH I H, LIAO H Y M. YOLOv9: learning what you want to learn using programmable gradient information[EB/OL]. (2024-02-29)[2024-03-05]. https://arxiv.org/abs/2402.13616.
[23]	ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection[EB/OL]. (2021-03-18)[2024-03-06]. https://arxiv.org/abs/2010.04159.
[24]	ZHAO Y A, LV W Y, XU S L, et al. DETR beat YOLOs on real-time object detection[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2024: 16965-16974.
[25]	WANG H, WANG L H, CHEN H, et al. Waste-YOLO: towards high accuracy real-time abnormal waste detection in waste-to-energy power plant for production safety[J]. Measurement Science and Technology, 2024, 35(1): 016001.