多尺度感知与红外特征增强的RGB-T人群计数方法

郑棣文; 石洋宇; 谢承杰; 卢树华

doi:10.13700/j.bh.1001-5965.2024.0250

多尺度感知与红外特征增强的RGB-T人群计数方法

doi: 10.13700/j.bh.1001-5965.2024.0250

郑棣文¹,
石洋宇¹,
谢承杰¹,
卢树华^{1, 2, ,}

1.
中国人民公安大学信息网络安全学院，北京 102600
2.
公安部安全防范技术与风险评估重点实验室，北京 102600

基金项目:

中国人民公安大学双一流创新研究专项(2023SYL08)

详细信息

通讯作者:
E-mail：lushuhua@ppsuc.edu.cn

中图分类号: TP391.4
计量
- 文章访问数: 11
- HTML全文浏览量: 3
- PDF下载量: 4
- 被引次数: 0
出版历程
- 收稿日期: 2024-04-24
- 录用日期: 2024-09-13
- 网络出版日期: 2024-09-19
- 整期出版日期: 2026-06-30

RGB-T crowd counting method with multi-scale perception and infrared feature enhancement

1.
College of Information and Cyber Security，People’s Public Security University of China，Beijing 102600，China
2.
Key Laboratory of Security Technology and Risk Assessment Ministry of Public Security，Beijing 102600，China

Funds:

Double First-Class Innovation Research Project for People’s Public Security University of China (2023SYL08)

More Information

Corresponding author: E-mail：lushuhua@ppsuc.edu.cn

摘要

摘要:
RGB-T人群计数旨在利用可见光与热图像的互补信息生成人群密度图，以应对低光照场景下人群计数任务。针对RGB-T人群计数在跨模态信息融合时，存在尺度变化、背景干扰等问题，提出一种基于多尺度感知与红外特征增强的RGB-T人群计数方法(MSENet)。该方法提出RGB-T特征融合机制(RTFM)，通过多分支结构实现多尺度特征提取，设计红外增强结构以充分捕捉热图像中的人群信息；利用密集连接与信息发散机制将互补特征传递到各个模态中，实现互补特征表达复用及模态特征增强。所提方法在RGBT-CC数据集和ShanghaiTechRGBD数据集上进行了对比实验。结果表明：所提方法优于现有的一些先进方法，具有较好的准确性、稳健性及良好的泛化性。
- 人群计数 /
- 红外与可见光图像 /
- 特征增强 /
- 多尺度感知 /
- 跨模态特征融合
Abstract:
In order to overcome the difficulty of crowd counting in low light, RGB-T crowd counting attempts to create maps of crowd density utilizing complimentary information from visual and thermal imagery. However, existing RGB-T crowd counting methods face issues such as scale variation and background interference during cross-modality information fusion. To tackle these challenges, we propose an RGB-T crowd counting method based on multi-scale perception and infrared feature enhancement (MSENet). Our approach presents an RGB-T feature fusion mechanism (RTFM) that creates an infrared enhancement structure to completely capture crowd information in thermal images and uses a multi-branch structure for multi-scale feature extraction. Additionally, we utilize dense connections and information divergence mechanisms to transfer complementary features to each modality, achieving a reusable expression of complementary features and enhanced modality features. We evaluate our proposed method on the RGBT-CC dataset and the ShanghaiTechRGBD dataset through comparative experiments. The results demonstrate that our method outperforms existing state-of-the-art approaches on the RGBT-CC dataset, exhibiting good accuracy, robustness and good generalization.
- crowd counting /
- infrared and visible image /
- feature enhancement /
- multi-scale perception /
- cross-modal feature fusion

HTML全文

图 1 不同光线下RGB图像与热图像

Figure 1. RGB images and thermal images in different lighting conditions

下载: 全尺寸图片幻灯片

图 2 MSENet结构

Figure 2. Architecture of MSENet

下载: 全尺寸图片幻灯片

图 3 RTFM结构

Figure 3. Architecture of RTFM

下载: 全尺寸图片幻灯片

图 4 跨模态特征融合

Figure 4. Cross-modal feature fusion

下载: 全尺寸图片幻灯片

图 5 红外特征增强结构

Figure 5. Architecture of thermal enhanced structure

下载: 全尺寸图片幻灯片

图 6 不同光照条件下生成的人群密度图

Figure 6. Crowd density maps generated in different illumination conditions

下载: 全尺寸图片幻灯片

图 7 消融实验结果对比

Figure 7. Comparison of the results of ablation experiments

下载: 全尺寸图片幻灯片

表 1 RGBT-CC 数据集结果对比

Table 1. Comparison results on the RGBT-CC dataset

方法	框架	GAME(0)	GAME(1)	GAME(2)	GAME(3)	RMSE
IADM+BL^[3]	BL	15.61	19.95	24.69	32.89	28.18
CSCA+BL^[4]	BL	14.32	18.91	23.81	32.47	26.01
MAT^[6]	BL	12.35	16.29	20.81	29.09	22.53
CSA-Net^[8]	BL	12.45	16.46	21.48	30.62	21.64
IADM+CSRNet^[3]	CSRNet	17.94	21.44	26.17	33.33	30.91
CSA-Net^[8]	CSRNet	15.77	19.40	24.14	30.14	29.17
CNCTrans^[5]	CNN+Transformer	13.96	17.98	23.03	31.15	24.55
TAFNet^[7]	VGG16	12.38	16.98	21.86	30.19	22.45
MSENet(本文)	BL	12.21	16.32	20.69	28.94	21.59

下载: 导出CSV

表 2 RGBT-CC数据集上不同光照条件下的实验结果

Table 2. The results under different illumination conditions on RGBT-CC dataset

光照	方法	GAME(0)	GAME(1)	GAME(2)	GAME(3)	RMSE
明亮	IADM+CSRNet^[3]	20.36	23.57	28.49	36.29	32.57
	TAFNet^[7]	15.57	20.65	26.67	36.17	24.25
	CNCTrans^[5]	15.05	19.04	24.21	32.91	25.00
	MSENet(本文)	13.81	17.89	23.26	31.12	24.84
黑暗	IADM+CSRNet^[3]	15.44	19.23	23.79	30.28	29.11
	TAFNet^[7]	14.20	19.20	24.00	31.63	27.50
	CNCTrans^[5]	13.34	17.38	21.73	29.16	24.70
	MSENet(本文)	11.62	16.19	20.06	27.63	21.27

下载: 导出CSV

表 3 ShanghaiTechRGBD 数据集结果对比

Table 3. Comparison results on the ShanghaiTechRGBD dataset

方法	GAME(0)	GAME(1)	GAME(2)	GAME(3)	RMSE
UC-Net^[26]	10.81	15.24	22.04	32.98	15.70
HDFNet^[27]	8.32	13.93	17.97	22.62	13.01
BBS-Net^[28]	6.26	8.53	11.80	16.46	9.26
IADM+BL^[3]	7.13	9.28	13.00	19.53	10.27
CSCA+BL^[4]	5.68	7.70	10.45	15.88	8.66
CSA-Net^[8]	4.57	5.82	8.02	12.47	6.83
MSENet(本文)	4.75	6.29	8.96	12.04	7.55

下载: 导出CSV

表 4 消融实验结果对比

Table 4. Comparison results of ablation experiments

方法	GAME(0)	GAME(1)	GAME(2)	GAME(3)	RMSE
Baseline	15.43	18.61	24.17	31.13	25.22
Baseline+Fusion	12.79	16.94	22.02	29.70	22.05
Baseline+Fusion+TES	12.21	16.32	20.69	28.94	21.59

下载: 导出CSV

表 5 RTFM中不同分支数量在RGBT-CC数据集上的结果对比

Table 5. Comparison results of different branch numbers in RTFM on the RGBT-CC dataset

分支数量	GAME(0)	GAME(1)	GAME(2)	GAME(3)	RMSE
1分支	17.95	23.61	30.84	39.42	32.26
2分支	16.22	21.81	28.03	38.51	28.54
3分支	14.27	18.95	24.16	34.92	24.37
4分支	12.74	16.94	21.67	30.63	21.81
5分支	13.47	18.11	23.06	33.03	22.24
6分支	14.53	19.46	25.23	35.62	23.68

下载: 导出CSV

参考文献(28)

[1]	LIU C H, CHEN Y F, HE X Y, et al. A scale aggregation and spatial-aware network for multi-view crowd counting[J]. IEEE Acces, 2022, 10: 108604-108613.
[2]	GAO G S, GAO J Y, LIU Q J, et al. CNN-based density estimation and crowd counting: a survey[EB/OL]. (2020-03-28)[2024-01-03]. https://doi.org/10.48550/arXiv.2003.12783.
[3]	LIU L, CHEN J, WU H, et al. Cross-modal collaborative representation learning and a large-scale RGBT benchmark for crowd counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 4821-4831.
[4]	ZHANG Y, CHOI S, HONG S. Spatio-channel attention blocks for cross-modal crowd counting[C]//Proceedings of the Computer Vision–ACCV 2022. Berlin: Springer, 2023: 22-40.
[5]	ZHANG S H, WANG W, ZHAO W B, et al. A cross-modal crowd counting method combining CNN and cross-modal transformer[J]. Image and Vision Computing, 2023, 129: 104592.
[6]	WU Z T, LIU L B, ZHANG Y, et al. Multimodal crowd counting with mutual attention transformers[C]//Proceedings of the IEEE International Conference on Multimedia and Expo. Piscataway: IEEE Press, 2022: 1-6.
[7]	TANG H H, WANG Y, CHAU L P. TAFNet: A three-stream adaptive fusion network for RGB-T crowd counting[C]//Proceedings of the IEEE International Symposium on Circuits and Systems. Piscataway: IEEE Press, 2022: 3299-3303.
[8]	LI H, ZHANG J G, KONG W H, et al. CSA-Net: cross-modal scale-aware attention-aggregated network for RGB-T crowd counting[J]. Expert Systems with Applications, 2023, 213: 119038.
[9]	CHENG J H, CHEN Z J, ZHANG X Y, et al. Exploit the potential of multi-column architecture for crowd counting[EB/OL]. (2020-07-28)[2024-01-30]. https://doi.org/10.48550/arXiv.2007.05779.
[10]	BAI S, HE Z Q, QIAO Y, et al. Adaptive dilated network with self-correction supervision for counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4593-4602.
[11]	LIU L B, QIU Z L, LI G B, et al. Crowd counting with deep structured scale integration network[C]//Proceedings of the IEEE/CVF IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE Press, 2020: 1774-1783.
[12]	LIU L B, ZHEN J J, LI G B, et al. Dynamic spatial-temporal representation learning for traffic flow prediction[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(11): 7169-7183.
[13]	LIU W Z, SALZMANN M, FUA P. Contextaware crowd counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 5094-5103.
[14]	DENG L J, ZHOU Q H, WANG S H, et al. Deep learning in crowd counting: a survey[J]. CAAI Transactions on Intelligence Technology, 2024, 9(5): 1043-1077.
[15]	LI B, HUANG H B, ZHANG A, et al. Approaches on crowd counting and density estimation: a review[J]. Pattern Analysis and Applications, 2021, 24(3): 853-874.
[16]	JIANG X L, XIAO Z H, ZHANG B C, et al. Crowd counting and density estimation by trellis encoder-decoder networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6126-6135.
[17]	余鹰, 朱慧琳, 钱进, 等. 基于深度学习的人群计数研究综述[J]. 计算机研究与发展, 2021, 58(12): 2724-2747. YU Y, ZHU H L, QIAN J, et al. Survey on deep learning based crowd counting[J]. Journal of Computer Research and Development, 2021, 58(12): 2724-2747(in Chinese).
[18]	ZHANG Y Y, ZHOU D S, CHEN S Q, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 589-597.
[19]	SAM D B, SURYA S, BABU R V. Switching convolutional neural network for crowd counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 4031-4039.
[20]	LI Y H, ZHANG X F, CHEN D M. CSRNet: dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 1091-1100.
[21]	MA Z H, WEI X, HONG X P, et al. Bayesian loss for crowd count estimation with point supervision[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 6141-6150.
[22]	PENG T, LI Q, ZHU P F. RGB-T crowd counting from drone: a benchmark and MMCCN network[C]//Proceedings of the Computer Vision-ACCV 2020. Berlin: Springer, 2021: 497-513.
[23]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the Computer Vision-ECCV 2018. Berlin: Springer, 2018: 3-19.
[24]	LIAN D Z, LI J, ZHENG J, et al. Density map regression guided detection network for RGB-D crowd counting and localization[C]//Proceedings of the IEEE/CVF Conference on Computer Cision and Pattern Recognition. Piscataway: IEEE Press, 2020: 1821-1830.
[25]	GUERRERO-GÓMEZ-OLMEDO R, TORRE-JIMÉNEZ B, LÓPEZ-SASTRE R, et al. Extremely overlapping vehicle counting[C]//Proceedings of the Pattern Recognition and Image Analysis. Berlin: Springer, 2015: 423-431.
[26]	ZHANG J, FAN D P, DAI Y C, et al. UC-Net: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 8579-8588.
[27]	PANG Y W, ZHANG L H, ZHAO X Q, et al. Hierarchical dynamic filtering network for RGB-D salient object detection[C]//Proceedings of the Computer Vision-ECCV 2020. Berlin: Springer, 2020: 235-252.
[28]	FAN D P, ZHAI Y J, BORJI A, et al. BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network[C]//Proceedings of the Computer Vision-ECCV 2020. Berlin: Springer, 2020: 275-292.