留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

空间混洗与链式残差增强的轻量级视频人群计数

陈永 张娇娇 周方春

黄勇, 郭志辉. 适于旋流杯下游流场的改进型半分析关系式[J]. 北京航空航天大学学报, 2006, 32(06): 630-634.
引用本文: 陈永,张娇娇,周方春. 空间混洗与链式残差增强的轻量级视频人群计数[J]. 北京航空航天大学学报,2025,51(2):397-408 doi: 10.13700/j.bh.1001-5965.2023.0063
Huang Yong, Guo Zhihui. Improved semi-analytical correlation approach for the flowfield downstream of swirl cups[J]. Journal of Beijing University of Aeronautics and Astronautics, 2006, 32(06): 630-634. (in Chinese)
Citation: CHEN Y,ZHANG J J,ZHOU F C. Lightweight video crowd counting with spatial shuffling and chain residual enhancement[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(2):397-408 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0063

空间混洗与链式残差增强的轻量级视频人群计数

doi: 10.13700/j.bh.1001-5965.2023.0063
基金项目: 国家自然科学基金(62462043,61963023);兰州交通大学重点研发项目(ZDYF2304)
详细信息
    通讯作者:

    E-mail:edukeylab@126.com

  • 中图分类号: TP391.4

Lightweight video crowd counting with spatial shuffling and chain residual enhancement

Funds: National Natural Science Foundation of China (62462043,61963023); Key Research and Development Project of Lanzhou Jiaotong University (ZDYF2304)
More Information
  • 摘要:

    针对现有视频人群计数方法网络模型复杂度高、精确度和实时性差的问题,提出了一种空间混洗与链式残差增强的轻量级视频人群计数方法。所提模型由多尺度深度可分离反向卷积编码器、尺度回归解码器和预测输出层构成。在编码器部分,设计多尺度深度可分离反向残差块,提取不同分辨率的人群特征及相邻帧之间的时域特征信息,提高模型的轻量化程度;提出空间混洗模块嵌入到编码骨干网络中,增强不同尺度人群特征提取能力。在解码器部分,改进多分辨率融合模块及链式残差模块,对编码器输出的不同分辨率特征逐层聚合,减少细节特征丢失。通过解码器预测输出,得到回归人群密度图,并通过对密度图逐像素求和输出计数结果。所提方法在Mall、UCSD、FDST、ShanghaiTech等人群视频数据集上进行对比实验,结果表明:所提方法检测帧率和参数量等评价指标均优于对比方法;在Mall数据集上,相较于ConvLSTM人群计数方法,所提方法的平均绝对误差(MAE)、均方误差(MSE)的误差值分别降低了43.75%、72.71%,对不同场景视频人群计数具有更高的准确率和实时性。

     

  • 图 1  整体网络模型结构

    Figure 1.  Structure of overall network model

    图 2  多尺度深度可分离反向卷积编码器结构

    Figure 2.  Structure of multi-scale deep separable reverse convolutional encoder

    图 3  混合深度可分离卷积结构

    Figure 3.  Structure of mixed depth separable convolution

    图 4  多尺度深度可分离反向残差块结构

    Figure 4.  Structure of multi-scale depth separable reverse residual block

    图 5  空间混洗模块

    Figure 5.  Spatial shuffling module

    图 6  Ghost卷积模块结构

    Figure 6.  Structure of Ghost convolution module

    图 7  空间群归一化模块

    Figure 7.  Space group normalization module

    图 8  链式残差模块结构

    Figure 8.  Structural of chain residual module

    图 9  多分辨率融合模块结构

    Figure 9.  Structure of multi-resolution fusion module

    图 10  Mall数据集实验结果

    Figure 10.  Experiment results of Mall dataset

    图 11  UCSD数据集实验结果

    Figure 11.  Experimental results of UCSD dataset

    图 12  FDST数据集实验结果

    Figure 12.  Experimental results of FDST dataset

    图 13  ShanghaiTech Part A数据集实验结果

    Figure 13.  Experimental results of ShanghaiTech Part A dataset

    图 14  ShanghaiTech Part B实验结果

    Figure 14.  Experimental results of ShanghaiTech Part B

    表  1  Mall数据集评价指标对比

    Table  1.   Comparison of evaluation indicators of Mall dataset

    方法 MAE MSE
    高斯回归[10] 3.72 20.1
    岭回归[11] 3.59 19.0
    累积回归[14] 3.43 17.7
    ConvLSTM[15] 2.24 8.5
    STDNet[16] 1.47 2.88
    本文方法 1.26 2.32
    下载: 导出CSV

    表  2  UCSD数据集评价指标对比

    Table  2.   Comparison of evaluation indicators of UCSD dataset

    方法 MAE MSE
    高斯回归[10] 2.24 7.97
    岭回归[11] 2.25 7.82
    累积回归[14] 2.07 6.90
    ConvLSTM[15] 1.30 1.79
    STDNet[16] 1.13 1.43
    本文方法 1.05 1.38
    下载: 导出CSV

    表  3  FDST数据集评价指标对比

    Table  3.   Comparison of evaluation indicators of FDST dataset

    方法MAEMSE
    ConvLSTM[15]4.55.8
    LST[12]3.44.5
    STANet[19]3.65.1
    EPF-C[17]2.22.6
    GNANet[20]2.12.9
    本文方法2.062.53
    下载: 导出CSV

    表  4  hanghaiTech数据集评价指标对比

    Table  4.   Comparison of evaluation indicators of ShanghaiTech dataset

    方法 MAE MSE
    PartA PartB PartA PartB
    MCNN[13] 110.2 26.4 173.2 41.4
    CMTL[21] 101.3 20.0 152.4 31.1
    CSRNet[22] 68.2 10.6 115.0 16.0
    SANet[23] 67.0 8.4 104.5 13.6
    LigMSANet[24] 76.6 11.0 121.4 19.0
    SAFECount[25] 73.70 9.98 119.4 18.3
    本文方法 65.4 8.2 102.0 12.4
    下载: 导出CSV

    表  5  模型轻量化指标对比

    Table  5.   Comparison of model lightweight indicators

    方法帧率/(帧·s−1)参数量浮点型运算/GFLOPs
    MCNN[13]27.50.13×10656.21
    CMTL[21]8.02.46×106243.80
    CSRNet[22]5.216.26×106857.84
    SANet[23]7.31.39×106182.26
    本文方法32.53.47×10619.42
    下载: 导出CSV

    表  6  消融实验指标对比

    Table  6.   Comparison of ablation experimental indicators

    MS-DSR SSM G-SSM SGN CRP MAE MSE
    89.4 146.0
    87.1 133.8
    82.0 128.2
    75.9 119.9
    71.5 113.7
    65.4 102.0
    下载: 导出CSV
  • [1] GAO H, ZHAO W J, ZHANG D X, et al. Application of improved Transformer based on weakly supervised in crowd localization and crowd counting[J]. Scientific Reports, 2023, 13(1): 1144.
    [2] 余鹰, 朱慧琳, 钱进, 等. 基于深度学习的人群计数研究综述[J]. 计算机研究与发展, 2021, 58(12): 2724-2747. doi: 10.7544/issn1000-1239.2021.20200699

    YU Y, ZHU H L, QIAN J, et al. Survey on deep learning based crowd counting[J]. Journal of Computer Research and Development, 2021, 58(12): 2724-2747(in Chinese). doi: 10.7544/issn1000-1239.2021.20200699
    [3] CHAN A B, VASCONCELOS N. Bayesian Poisson regression for crowd counting[C]//Proceedings of the IEEE 12th International Conference on Computer Vision. Piscataway: IEEE Press, 2009: 545-551.
    [4] MIAO Y Q, HAN J G, GAO Y S, et al. ST-CNN: Spatial-temporal convolutional neural network for crowd counting in videos[J]. Pattern Recognition Letters, 2019, 125: 113-118. doi: 10.1016/j.patrec.2019.04.012
    [5] WU X J, XU B H, ZHENG Y B, et al. Fast video crowd counting with a temporal aware network[J]. Neurocomputing, 2020, 403: 13-20. doi: 10.1016/j.neucom.2020.04.071
    [6] BAI H Y, CHAN S H G. Motion-guided non-local spatial-temporal network for video crowd counting[EB/OL]. (2021-04-28)[2023-02-01]. https://arxiv.org/abs/2104.13946v1.
    [7] CAI Y Q, MA Z W, LU C H, et al. Global representation guided adaptive fusion network for stable video crowd counting[J]. IEEE Transactions on Multimedia, 2022, 25: 5222-5233.
    [8] TAN M X, LE Q V. MixConv: Mixed depthwise convolutional kernels[EB/OL]. (2019-12-01)[2023-02-01]. https://arxiv.org/abs/1907.09595v3.
    [9] WANG P, GAO C Y, WANG Y, et al. MobileCount: An efficient encoder-decoder framework for real-time crowd counting[J]. Neurocomputing, 2020, 407: 292-299. doi: 10.1016/j.neucom.2020.05.056
    [10] CHAN A B, LIANG Z S J, VASCONCELOS N. Privacy preserving crowd monitoring: Counting people without people models or tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2008: 1-7.
    [11] CHEN K, LOY C C, GONG S G, et al. Feature mining for localised crowd counting[C]//Proceedings of the British Machine Vision Conference. Guildford: British Machine Vision Association, 2012.
    [12] FANG Y Y, ZHAN B Y, CAI W D, et al. Locality-constrained spatial Transformer network for video crowd counting[C]//Proceedings of the IEEE International Conference on Multimedia and Expo. Piscataway: IEEE Press, 2019: 814-819.
    [13] ZHANG Y Y, ZHOU D S, CHEN S Q, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 589-597.
    [14] CHEN K, GONG S G, XIANG T, et al. Cumulative attribute space for age and crowd density estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2013: 2467-2474.
    [15] XIONG F, SHI X J, YEUNG D Y. Spatio temporal modeling for crowd counting in videos[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 5161-5169.
    [16] MA Y J, SHUAI H H, CHENG W H. Spatio temporal dilated convolution with uncertain matching for video-based crowd estimation[J]. IEEE Transactions on Multimedia, 2021, 24: 261-273.
    [17] LIU WZ, SALZMANN M, FUA P. Estimating people flows to better count them in crowded scenes[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 723-740.
    [18] WU Z, ZHANG X F, TIAN G, et al. Spatial-temporal graph network for video crowd counting[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(1): 228-241. doi: 10.1109/TCSVT.2022.3187194
    [19] WEN L Y, DU D W, ZHU P F, et al. Drone-based joint density map estimation, localization and tracking with space-time multi-scale attention network[EB/OL]. (2019-12-04)[2023-02-01]. https://arxiv.org/abs/1912.01811.
    [20] LI H P, LIU L B, YANG K L, et al. Video crowd localization with multifocus Gaussian neighborhood attention and a large-scale bench-mark[J]. IEEE Transactions on Image Processing, 2022, 31: 6032-6047. doi: 10.1109/TIP.2022.3205210
    [21] SINDAGI V A, PATEL V M. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting[C]//Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway: IEEE Press, 2017: 1-6.
    [22] LI Y H, ZHANG X F, CHEN D M. CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 1091-1100.
    [23] CAO X K, WANG Z, P ZHAOY Y, et al. Scale aggregation network for accurate and efficient crowd counting[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 757-773.
    [24] JIANG G Q, WU R, HUO Z Q, et al. LigMSANet: Lightweight multi-scale adaptive convolutional neural network for dense crowd counting[J]. Expert Systems with Applications, 2022, 197: 116662. doi: 10.1016/j.eswa.2022.116662
    [25] YOU Z Y, YANG K, LUO W H, et al. Few-shot object counting with similarity-aware feature enhancement[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2023: 6304-6313.
  • 加载中
图(14) / 表(6)
计量
  • 文章访问数:  404
  • HTML全文浏览量:  100
  • PDF下载量:  18
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-02-20
  • 录用日期:  2023-04-27
  • 网络出版日期:  2023-07-03
  • 整期出版日期:  2025-02-28

目录

    /

    返回文章
    返回
    常见问答