RGB-T crowd counting method with multi-scale perception and infrared feature enhancement
-
摘要:
RGB-T人群计数旨在利用可见光与热图像的互补信息生成人群密度图,以应对低光照场景下人群计数任务。针对RGB-T人群计数在跨模态信息融合时,存在尺度变化、背景干扰等问题,提出一种基于多尺度感知与红外特征增强的RGB-T人群计数方法(MSENet)。该方法提出RGB-T特征融合机制(RTFM),通过多分支结构实现多尺度特征提取,设计红外增强结构以充分捕捉热图像中的人群信息;利用密集连接与信息发散机制将互补特征传递到各个模态中,实现互补特征表达复用及模态特征增强。所提方法在RGBT-CC数据集和ShanghaiTechRGBD数据集上进行了对比实验。结果表明:所提方法优于现有的一些先进方法,具有较好的准确性、稳健性及良好的泛化性。
Abstract:In order to overcome the difficulty of crowd counting in low light, RGB-T crowd counting attempts to create maps of crowd density utilizing complimentary information from visual and thermal imagery. However, existing RGB-T crowd counting methods face issues such as scale variation and background interference during cross-modality information fusion. To tackle these challenges, we propose an RGB-T crowd counting method based on multi-scale perception and infrared feature enhancement (MSENet). Our approach presents an RGB-T feature fusion mechanism (RTFM) that creates an infrared enhancement structure to completely capture crowd information in thermal images and uses a multi-branch structure for multi-scale feature extraction. Additionally, we utilize dense connections and information divergence mechanisms to transfer complementary features to each modality, achieving a reusable expression of complementary features and enhanced modality features. We evaluate our proposed method on the RGBT-CC dataset and the ShanghaiTechRGBD dataset through comparative experiments. The results demonstrate that our method outperforms existing state-of-the-art approaches on the RGBT-CC dataset, exhibiting good accuracy, robustness and good generalization.
-
表 1 RGBT-CC 数据集结果对比
Table 1. Comparison results on the RGBT-CC dataset
方法 框架 GAME(0) GAME(1) GAME(2) GAME(3) RMSE IADM+BL[3] BL 15.61 19.95 24.69 32.89 28.18 CSCA+BL[4] BL 14.32 18.91 23.81 32.47 26.01 MAT[6] BL 12.35 16.29 20.81 29.09 22.53 CSA-Net [8] BL 12.45 16.46 21.48 30.62 21.64 IADM+CSRNet[3] CSRNet 17.94 21.44 26.17 33.33 30.91 CSA-Net[8] CSRNet 15.77 19.40 24.14 30.14 29.17 CNCTrans[5] CNN+Transformer 13.96 17.98 23.03 31.15 24.55 TAFNet[7] VGG16 12.38 16.98 21.86 30.19 22.45 MSENet(本文) BL 12.21 16.32 20.69 28.94 21.59 表 2 RGBT-CC数据集上不同光照条件下的实验结果
Table 2. The results under different illumination conditions on RGBT-CC dataset
光照 方法 GAME(0) GAME(1) GAME(2) GAME(3) RMSE 明亮 IADM+CSRNet [3] 20.36 23.57 28.49 36.29 32.57 TAFNet[7] 15.57 20.65 26.67 36.17 24.25 CNCTrans[5] 15.05 19.04 24.21 32.91 25.00 MSENet(本文) 13.81 17.89 23.26 31.12 24.84 黑暗 IADM+CSRNet [3] 15.44 19.23 23.79 30.28 29.11 TAFNet[7] 14.20 19.20 24.00 31.63 27.50 CNCTrans[5] 13.34 17.38 21.73 29.16 24.70 MSENet(本文) 11.62 16.19 20.06 27.63 21.27 表 3 ShanghaiTechRGBD 数据集结果对比
Table 3. Comparison results on the ShanghaiTechRGBD dataset
方法 GAME(0) GAME(1) GAME(2) GAME(3) RMSE UC-Net[26] 10.81 15.24 22.04 32.98 15.70 HDFNet[27] 8.32 13.93 17.97 22.62 13.01 BBS-Net[28] 6.26 8.53 11.80 16.46 9.26 IADM+BL[3] 7.13 9.28 13.00 19.53 10.27 CSCA+BL[4] 5.68 7.70 10.45 15.88 8.66 CSA-Net[8] 4.57 5.82 8.02 12.47 6.83 MSENet(本文) 4.75 6.29 8.96 12.04 7.55 表 4 消融实验结果对比
Table 4. Comparison results of ablation experiments
方法 GAME(0) GAME(1) GAME(2) GAME(3) RMSE Baseline 15.43 18.61 24.17 31.13 25.22 Baseline+Fusion 12.79 16.94 22.02 29.70 22.05 Baseline+Fusion+TES 12.21 16.32 20.69 28.94 21.59 表 5 RTFM中不同分支数量在RGBT-CC数据集上的结果对比
Table 5. Comparison results of different branch numbers in RTFM on the RGBT-CC dataset
分支数量 GAME(0) GAME(1) GAME(2) GAME(3) RMSE 1分支 17.95 23.61 30.84 39.42 32.26 2分支 16.22 21.81 28.03 38.51 28.54 3分支 14.27 18.95 24.16 34.92 24.37 4分支 12.74 16.94 21.67 30.63 21.81 5分支 13.47 18.11 23.06 33.03 22.24 6分支 14.53 19.46 25.23 35.62 23.68 -
[1] LIU C H, CHEN Y F, HE X Y, et al. A scale aggregation and spatial-aware network for multi-view crowd counting[J]. IEEE Acces, 2022, 10: 108604-108613. [2] GAO G S, GAO J Y, LIU Q J, et al. CNN-based density estimation and crowd counting: a survey[EB/OL]. (2020-03-28)[2024-01-03]. https://doi.org/10.48550/arXiv.2003.12783. [3] LIU L, CHEN J, WU H, et al. Cross-modal collaborative representation learning and a large-scale RGBT benchmark for crowd counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 4821-4831. [4] ZHANG Y, CHOI S, HONG S. Spatio-channel attention blocks for cross-modal crowd counting[C]//Proceedings of the Computer Vision–ACCV 2022. Berlin: Springer, 2023: 22-40. [5] ZHANG S H, WANG W, ZHAO W B, et al. A cross-modal crowd counting method combining CNN and cross-modal transformer[J]. Image and Vision Computing, 2023, 129: 104592. [6] WU Z T, LIU L B, ZHANG Y, et al. Multimodal crowd counting with mutual attention transformers[C]//Proceedings of the IEEE International Conference on Multimedia and Expo. Piscataway: IEEE Press, 2022: 1-6. [7] TANG H H, WANG Y, CHAU L P. TAFNet: A three-stream adaptive fusion network for RGB-T crowd counting[C]//Proceedings of the IEEE International Symposium on Circuits and Systems. Piscataway: IEEE Press, 2022: 3299-3303. [8] LI H, ZHANG J G, KONG W H, et al. CSA-Net: cross-modal scale-aware attention-aggregated network for RGB-T crowd counting[J]. Expert Systems with Applications, 2023, 213: 119038. [9] CHENG J H, CHEN Z J, ZHANG X Y, et al. Exploit the potential of multi-column architecture for crowd counting[EB/OL]. (2020-07-28)[2024-01-30]. https://doi.org/10.48550/arXiv.2007.05779. [10] BAI S, HE Z Q, QIAO Y, et al. Adaptive dilated network with self-correction supervision for counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4593-4602. [11] LIU L B, QIU Z L, LI G B, et al. Crowd counting with deep structured scale integration network[C]//Proceedings of the IEEE/CVF IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE Press, 2020: 1774-1783. [12] LIU L B, ZHEN J J, LI G B, et al. Dynamic spatial-temporal representation learning for traffic flow prediction[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(11): 7169-7183. [13] LIU W Z, SALZMANN M, FUA P. Contextaware crowd counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 5094-5103. [14] DENG L J, ZHOU Q H, WANG S H, et al. Deep learning in crowd counting: a survey[J]. CAAI Transactions on Intelligence Technology, 2024, 9(5): 1043-1077. [15] LI B, HUANG H B, ZHANG A, et al. Approaches on crowd counting and density estimation: a review[J]. Pattern Analysis and Applications, 2021, 24(3): 853-874. [16] JIANG X L, XIAO Z H, ZHANG B C, et al. Crowd counting and density estimation by trellis encoder-decoder networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6126-6135. [17] 余鹰, 朱慧琳, 钱进, 等. 基于深度学习的人群计数研究综述[J]. 计算机研究与发展, 2021, 58(12): 2724-2747.YU Y, ZHU H L, QIAN J, et al. Survey on deep learning based crowd counting[J]. Journal of Computer Research and Development, 2021, 58(12): 2724-2747(in Chinese). [18] ZHANG Y Y, ZHOU D S, CHEN S Q, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 589-597. [19] SAM D B, SURYA S, BABU R V. Switching convolutional neural network for crowd counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 4031-4039. [20] LI Y H, ZHANG X F, CHEN D M. CSRNet: dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 1091-1100. [21] MA Z H, WEI X, HONG X P, et al. Bayesian loss for crowd count estimation with point supervision[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 6141-6150. [22] PENG T, LI Q, ZHU P F. RGB-T crowd counting from drone: a benchmark and MMCCN network[C]//Proceedings of the Computer Vision-ACCV 2020. Berlin: Springer, 2021: 497-513. [23] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the Computer Vision-ECCV 2018. Berlin: Springer, 2018: 3-19. [24] LIAN D Z, LI J, ZHENG J, et al. Density map regression guided detection network for RGB-D crowd counting and localization[C]//Proceedings of the IEEE/CVF Conference on Computer Cision and Pattern Recognition. Piscataway: IEEE Press, 2020: 1821-1830. [25] GUERRERO-GÓMEZ-OLMEDO R, TORRE-JIMÉNEZ B, LÓPEZ-SASTRE R, et al. Extremely overlapping vehicle counting[C]//Proceedings of the Pattern Recognition and Image Analysis. Berlin: Springer, 2015: 423-431. [26] ZHANG J, FAN D P, DAI Y C, et al. UC-Net: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 8579-8588. [27] PANG Y W, ZHANG L H, ZHAO X Q, et al. Hierarchical dynamic filtering network for RGB-D salient object detection[C]//Proceedings of the Computer Vision-ECCV 2020. Berlin: Springer, 2020: 235-252. [28] FAN D P, ZHAI Y J, BORJI A, et al. BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network[C]//Proceedings of the Computer Vision-ECCV 2020. Berlin: Springer, 2020: 275-292. -


下载: