Fire-and-smoke detection algorithm based on convolutional attention and feature fusion
-
摘要:
针对现实场景下火灾检测精度与速度不平衡的情况,提出一种加强空间特征提取和多尺度特征融合的火灾检测算法。对主干网络的高层语义信息提取进行改进,将感受野卷积注意力模块嵌入主干网络中,提升模型的特征提取能力;引入改进后的强特征融合网络,将低层空间信息和高层语义信息进一步加强融合,提升模型精度;利用局部卷积(PConv)模块对主干网络和检测头进行轻量化改进,在不损失精度的前提下,降低模型的参数量和内存访问;调整回归损失函数,提升模型的检测能力。实验结果表明,改进算法在自建的火灾数据集上的0.5阈值下平均精度均值(mAP50)和0.5:0.95阈值下平均精度均值(mAP50:95)分别提高了2.1%和2.9%,证明了所提算法在火灾检测领域的优越性;在Pascal VOC 07+12公开数据集上的mAP50和mAP50:95分别提高了1.4%和2.4%,证明了所提算法具有较强的泛化性能。
Abstract:Aiming at the imbalance between the accuracy and speed of fire-and-smoke detection in real-world scenarios, this paper proposes a fire-and-smoke detection algorithm that strengthens spatial feature extraction and multi-scale feature fusion. We improved the extraction of high-level semantic information from the backbone network by embedding the receptive field convolutional attention module into it, enhancing the model’s feature extraction capability. Additionally, we introduced an enhanced strong feature fusion network to further strengthen the fusion of low-level spatial information and high-level semantic information, thereby improving model accuracy. In order to satisfy real-time needs, we additionally used the partial convolution (PConv) module to lightweightly enhance the detection head and backbone network, lowering the model’s parameter count and memory access without compromising accuracy. Furthermore, we adjusted the regression loss function to enhance the model's detection capabilities. According to experimental results, the suggested algorithm improves the mean average precision at IoU threshold 0.5 (mAP50) and 0.5:0.95 (mAP50:95) by 2.1% and 2.9%, respectively, demonstrating its superiority in the field of fire-and-smoke detection. Additionally, the mAP50 and mAP50:95 on the Pascal VOC 07+12 public dataset are increased by 1.4% and 2.4%, respectively, demonstrating the algorithm’s good generalization performance.
-
表 1 实验环境配置
Table 1. Experiment environment configuration
实验环境 环境条件 CPU Intel Xeon Silver 4310 GPU NVIDIA A40 操作系统 Ubuntu 18.04.6 编程语言 Python3.11.4 深度学习框架 Pytorch2.0.1 表 2 消融实验比较分析
Table 2. Comparative analysis of ablation experiment
模型 感受野卷积注意力模块 强特征融合网络 PConv MPDIOU mAP50/% mAP50:95/% 参数量 浮点运算
速度/109·s−1标准算法 95.2 69.1 11.1×106 28.6 改进1 √ 96.2 70.5 11.2×106 28.8 改进2 √ √ 96.7 71.4 15.1×106 33.0 改进3 √ √ √ 96.7 71.4 13.5×106 24.2 本文算法 √ √ √ √ 97.3 72.0 13.5×106 24.2 表 3 注意力模块比较结果
Table 3. Attention module comparison results
表 4 颈部网络比较结果
Table 4. Comparison results of the neck network
模型 mAP50/% mAP50:95/% 标准算法 95.2 69.1 +文献[7] 96.2 71.2 +强特征融合网络 96.5 71.3 表 5 损失函数对比实验结果
Table 5. Comparison experiment results of loss function
表 6 不同算法模型的检测结果对比
Table 6. Comparison of detection results of different algorithm models
算法 mAP50/% mAP50:95/% 参数量 浮点运算
速度/(109·s−1)Faster R-CNN[17] 88.2 51.5 41.1×106 91.0 SSD512[18] 95.7 62.1 24.5×106 87.9 YOLOv3-tiny[19] 94.4 61.0 8.7×106 13.0 YOLOv5m 96.2 68.8 21.2×106 49.0 YOLOv6s[20] 96.7 71.0 18.5×106 45.3 YOLOv7-tiny[21] 96.2 69.4 6.1×106 13.2 YOLOv8s 95.2 69.1 11.1×106 28.6 YOLOv9-T[22] 96.6 70.9 2.7×106 11.0 Deformable-DETR[23] 95.7 60.1 40.0×106 173.0 RT-DETR-Res18[24] 96.1 70.4 20.0×106 60.5 本文算法 97.3 72.0 13.5×106 24.2 表 7 不同算法模型在公共数据集上的检测结果对比
Table 7. Comparison of detection results of different algorithm models on public datasets
算法 mAP50/% mAP50:95/% 参数量 浮点运算
速度/(109·s−1)Faster-RCNN[17] 65.9 36.5 41.2×106 91.1 SSD512[18] 65.4 37.6 27.2×106 90.4 YOLOv3-tiny[19] 54.4 27.9 8.7×106 13.1 YOLOv5m 77.5 53.5 21.2×106 49.1 YOLOv6s[20] 75.8 53.5 18.5×106 45.4 YOLOv7-tiny[21] 70.1 43.4 6.1×106 13.3 YOLOX-s[25] 72.3 44.7 8.9×106 26.7 YOLOv8s 76.6 55.4 11.1×106 28.7 YOLOv9-T[22] 71.8 52.2 2.7×106 11.1 Deformable-DETR[23] 77.5 52.7 40.0×106 173.0 RT-DETR-Res18[24] 72.4 53.1 20.0×106 60.5 本文算法 78.0 57.8 13.5×106 24.3 -
[1] SHARMA J, GRANMO O C, GOODWIN M, et al. Deep convolutional neural networks for fire detection in images[C]//Proceedings of the Engineering Applications of Neural Networks. Berlin: Springer, 2017: 183-193. [2] HOSSEINI A, HASHEMZADEH M, FARAJZADEH N. UFS-Net: a unified flame and smoke detection method for early detection of fire in video surveillance applications using CNNs[J]. Journal of Computational Science, 2022, 61: 101638. [3] 张融, 张为. 基于改进GhostNet-FCOS的火灾检测算法[J]. 浙江大学学报(工学版), 2022, 56(10): 1891-1899.ZHANG R, ZHANG W. Fire detection algorithm based on improved GhostNet-FCOS[J]. Journal of Zhejiang University (Engineering Science), 2022, 56(10): 1891-1899(in Chinese). [4] 秦瑞, 张为. 一种无锚框结构的多尺度火灾检测算法[J]. 西安电子科技大学学报(自然科学版), 2022, 49(6): 111-119.QIN R, ZHANG W. Multi-scale fire detection algorithm with an anchor free structure[J]. Journal of Xidian University (Natural Science) and Technology, 2022, 49(6): 111-119(in Chinese). [5] ZHANG X, LIU C, YANG D G, et al. RFAconv: innovating spatial attention and standard convolutional operation[EB/OL]. (2023-04-06)[2024-01-09]. https://doi.org/10.48550/arXiv.2304.03198. [6] WANG K X, LIEW J H, ZOU Y T, et al. PANet: few-shot image semantic segmentation with prototype alignment[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 9196-9205. [7] WANG C C, HE W, NIE Y, et al. Gold-YOLO: efficient object detector via gather-and-distribute mechanism[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. New York: ACM, 2023: 51094-51112. [8] DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style ConvNets great again[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13728-13737. [9] WAN Q, HUANG Z L, LU J C, et al. Seaformer: squeeze-enhanced axial transformer for mobile visual recognition[EB/OL]. (2023-01-30)[2024-01-11]. https://arxiv.org/abs/2301.13156. [10] ZHANG H, ZU K K, LU J, et al. EPSANet: an efficient pyramid squeeze attention block on convolutional neural network[C]//Proceedings of the Computer Vision –ACCV 2022. Cham: Springer, 2023: 541-557. [11] CHEN J R, KAO S H, HE H, et al. Run, don’t walk: chasing higher FLOPS for faster neural networks[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2023: 12021-12031. [12] MA S L, XU Y. MPDIoU: a loss for efficient and accurate bounding box regression[EB/OL]. (2023-07-14)[2024-04-15]. https://arxiv.org/abs/2307.07662. [13] LIU Y C, SHAO Z R, HOFFMANN N. Global attention mechanism: retain information to enhance channel-spatial interactions[EB/OL]. (2024-12-10)[2024-01-26]. https://arxiv.org/abs/2112.05561. [14] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the Computer Vision–ECCV 2018. Cham: Springer, 2018: 3-19. [15] ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157. [16] GEVORGYAN Z. SIoU loss: more powerful learning for bounding box regression[EB/OL]. (2022-05-25)[2024-01-30]. https://arxiv.org/abs/2205.12740. [17] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence. Piscataway: IEEE Press, 2016: 1137-1149. [18] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the Computer Vision–ECCV 2016. Cham: Springer, 2016: 21-37. [19] ADARSH P, RATHI P, KUMAR M. YOLOv3-tiny: object detection and recognition using one stage improved model[C]//Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems. Piscataway: IEEE Press, 2020: 687-694. [20] LI C Y, LI L L, JIANG H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL]. (2022-09-07)[2024-02-03]. https://arxiv.org/abs/2209.02976. [21] WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2023: 7464-7475. [22] WANG C Y, YEH I H, LIAO H Y M. YOLOv9: learning what you want to learn using programmable gradient information[EB/OL]. (2024-02-29)[2024-03-05]. https://arxiv.org/abs/2402.13616. [23] ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection[EB/OL]. (2021-03-18)[2024-03-06]. https://arxiv.org/abs/2010.04159. [24] ZHAO Y A, LV W Y, XU S L, et al. DETR beat YOLOs on real-time object detection[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2024: 16965-16974. [25] WANG H, WANG L H, CHEN H, et al. Waste-YOLO: towards high accuracy real-time abnormal waste detection in waste-to-energy power plant for production safety[J]. Measurement Science and Technology, 2024, 35(1): 016001. -


下载: