留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多模态特征融合的异常行为检测方法

肖波 郭放 王蓉 曾昭龙

肖波,郭放,王蓉,等. 基于多模态特征融合的异常行为检测方法[J]. 北京航空航天大学学报,2025,51(12):4370-4378 doi: 10.13700/j.bh.1001-5965.2024.0455
引用本文: 肖波,郭放,王蓉,等. 基于多模态特征融合的异常行为检测方法[J]. 北京航空航天大学学报,2025,51(12):4370-4378 doi: 10.13700/j.bh.1001-5965.2024.0455
XIAO B,GUO F,WANG R,et al. Abnormal behavior detection method based on multi-modal feature fusion[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(12):4370-4378 (in Chinese) doi: 10.13700/j.bh.1001-5965.2024.0455
Citation: XIAO B,GUO F,WANG R,et al. Abnormal behavior detection method based on multi-modal feature fusion[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(12):4370-4378 (in Chinese) doi: 10.13700/j.bh.1001-5965.2024.0455

基于多模态特征融合的异常行为检测方法

doi: 10.13700/j.bh.1001-5965.2024.0455
基金项目: 

中国人民公安大学安全防范工程双一流专项(2023SYL08)

详细信息
    通讯作者:

    E-mail:guofang@ppsuc.edu.cn

  • 中图分类号: V328.3;TP181

Abnormal behavior detection method based on multi-modal feature fusion

Funds: 

Double First-class Security Project of People’s Public Security University of China (2023SYL08)

More Information
  • 摘要:

    针对视频中的异常行为检测易发生误检、漏检以及正负样本数量不平衡的问题,提出了一种多模态特征融合的异常行为检测方法。设计了跨模态感知模块,利用交叉注意力机制进行特征融合,提高跨模态数据特征的表达能力,并通过共享参数策略减少网络参数量。采用改进的二元交叉熵损失函数训练网络,在训练过程中针对较易区分样本实现动态降低权重,并将较大的权重聚焦在较难区分的样本上,提高了不均衡、难分类数据的处理能力,提高异常行为检测准确率。通过样本批次选择的策略,利用统计分析方法过滤出更多异常片段,有效解决异常片段漏选问题。在XD-Violence、Shanghai-Tech公开数据集及自制数据集上进行测试,在XD-Violence数据集上的AP值达到85.32%,在Shanghai-Tech数据集及自制数据集上的AUC检测值分别为96.84%和81.73%,实验结果充分证明了所提方法的有效性及泛化能力。

     

  • 图 1  基于多模态特征融合的异常行为检测模型框架

    Figure 1.  Abnormal behavior detection model framework based on multi-modal feature fusion

    图 2  跨模态感知图

    Figure 2.  Cross-modal perception diagram

    图 3  不同的异常片段选择策略

    Figure 3.  Different abnormal fragment selection strategies

    图 4  模型在自制数据集上的 ROC 曲线

    Figure 4.  ROC curve of the model on a custom dataset

    图 5  训练过程中精度与损失函数折线图

    Figure 5.  Line chart of accuracy and loss function during training

    图 6  XD-Violence异常分数可视化效果图

    Figure 6.  Anomaly scores visualization on XD-Violence

    图 7  自制数据集异常分数可视化效果图

    Figure 7.  Visualization of anomaly score of custom dataset

    表  1  实验平台配置

    Table  1.   Experimental platform configuration

    名称配置
    操作系统Ubuntu 20.04
    GPUA 100
    CPU32核CPU,16 G内存
    深度学习框架Pytorch + Tensorflow
    编程语言Python 3.8
    GPU加速平台CUDA 11.0
    下载: 导出CSV

    表  2  XD-Violence数据集上的实验结果对比

    Table  2.   Comparison of experimental results on the XD-Violence dataset

    特征 方法 使用场景 AP/%
    RTFM[21] ICCV′ 21 77.81
    MSL[16] AAAI′ 22 78.28
    I3D CU-Net CVPR ′ 23 78.74
    SAS[19] arXiv ′ 23 83.59
    UR-DMU[22] CVPR ′ 23 81.66
    CU-Net[21] CVPR ′ 23 81.43
    I3D+ UR-DMU[22] CVPR ′ 23 81.77
    VGGish MACIL-SD[14] MM ′ 22 83.40
    BN-WVAD[15] arXiv ′ 23 85.26
    本文 85.32
     注:加粗字体为每列最优值。
    下载: 导出CSV

    表  3  在Shanghai-Tech数据集上的实验结果对比

    Table  3.   Comparison of experimental results on the Shanghai-Tech dataset

    方法 特征 AUC/%
    Mem-AE[23] 71.20
    AMP-Net[24] 78.80
    MIST[7] I3D RGB 94.83
    AR-Net[26] I3D RGB 85.38
    S3R[27] I3D RGB 97.48
    UML[25] X-CLIP RGB 96.78
    本文 I3D RGB 96.84
     注:加粗字体为每列最优值。
    下载: 导出CSV

    表  4  消融实验结果

    Table  4.   Ablation study results

    模块 Cross-Attention RF Loss SBS AP/%
    baseline 83.40
    0 83.57
    1 84.75
    2 84.23
    3 84.82
    4 84.73
    5 83.89
    6 85.32
    下载: 导出CSV
  • [1] XU H T. Research on abnormal behavior detection in video based on deep learning[D]. Beijing: Beijing Jiaotong University, 2022, 31(2): 1-7.
    [2] BERMEJO NIEVAS E, DENIZ SUAREZ O, BUENO GARCÍA G, et al. Violence detection in video using computer vision techniques[C]//Proceedings of the Computer Analysis of Images and Patterns. Berlin: Springer Berlin Heidelberg, 2011: 332-339.
    [3] DENIZ O, SERRANO I, BUENO G, et al. Fast violence detection in video[C]//Proceedings of the 9th International Conference on Computer Vision Theory and Applications. Lisbon: SCITEPRESS-Science and Technology Publications, 2014 : 478-485.
    [4] LI A. Research on the key techniques of group abnormal behavior detection in surveillance video [D]. Beijing: Beijing Jiaotong University, 2021, 31(2): 1-4.
    [5] MA S, ZENG Z, Mcduff D, et al. Active contrastive learning of audio-visual video representations[EB/OL]. (2020-08-31)[2024-05-10]. https://doi.org/10.48550/arXiv.2009.09805.
    [6] FANG Z, WANG J, WANG L, et al. Seed: self-supervised distillation for visual representation[J]. (2021-01-12)[2024-05-11]. https://doi.org/10.48550/arXiv.2101.04731.
    [7] FENG J C, HONG F T, ZHENG W S. MIST: multiple instance self-training framework for video anomaly detection[C]//Proceedings of the 2021 the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 14004-14013.
    [8] PANG W F, HE Q H, HU Y J, et al. Violence detection in videos based on fusing visual and audio information[C]//Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2021: 2260-2264.
    [9] PEIXOTO B, LAVI B, BESTAGINI P, et al. Multimodal violence detection in videos[C]//Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2020: 2957-2961.
    [10] CHEN L Q, WANG D, GAN Z, et al. Wasserstein contrastive representation distillation[C]//Proceedings of the 2021 the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 16291-16300.
    [11] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[C]//Proceedings of the International Conference on Machine Learning. New York: PMLR, 2020: 1597-1607.
    [12] LI S, LIU F, JIAO L C. Self-training multi-sequence learning with transformer for weakly supervised video anomaly detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(2): 1395-1403. doi: 10.1609/aaai.v36i2.20028
    [13] HAN K, XIAO A, WU E, et al. Transformer in transformer[J]. Advances in Neural Information Processing Systems, 2021, 34: 15908-15919.
    [14] WU P, LIU J, SHI Y J, et al. Not only look, but also listen: learning multimodal violence detection under weak supervision[C]//Proceedings of the Computer Vision–ECCV 2020. Cham: Springer International Publishing, 2020: 322-339.
    [15] YU J S, LIU J Y, CHENG Y, et al. Modality-aware contrastive instance learning with self-distillation for weakly-supervised audio-visual violence detection[C]//Proceedings of the 30th ACM International Conference on Multimedia. New York: ACM, 2022: 6278-6287.
    [16] ZHOU Y X, QU Y, XU X, et al. BatchNorm-based weakly supervised video anomaly detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34: 13642-13654. doi: 10.1109/TCSVT.2024.3450734
    [17] CHEN H L, XIE W D, VEDALDI A, et al. Vggsound: A large-scale audio-visual dataset[C]//Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2020: 721-725.
    [18] LIU J Y, CHENG Y, ZHANG Y J, et al. Self-supervised video representation learning with motion-contrastive Perception[C]//2022 IEEE International Conference on Multimedia and Expo (ICME). Piscataway: IEEE Press, 2022: 1-6.
    [19] SCHÖLKOPF B, WILLIAMSON R C, SMOLA A, et al. Support vector method for novelty detection[J]. Advances in Neural Information Processing Systems, 1999, 12: 582-588.
    [20] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning. New York: PMLR, 2015: 448-456.
    [21] FAN Y D, YU Y X, LU W H, et al. Weakly-supervised video anomaly detection with snippet anomalous attention[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(7): 5480-5492. doi: 10.1109/TCSVT.2024.3350084
    [22] TIAN Y, PANG G S, CHEN Y H, et al. Weakly-supervised video anomaly detection with robust temporal feature magnitude learning[C]//Proceedings of the 2021 Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 4955-4966.
    [23] ZHOU H, YU J Q, YANG W. Dual memory units with uncertainty regulation for weakly supervised video anomaly detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(3): 3769-3777. doi: 10.1609/aaai.v37i3.25489
    [24] GONG D, LIU L Q, LE V, et al. Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 1705-1714.
    [25] LIU Y, LIU J, YANG K, et al. AMP-net: appearance-motion prototype network assisted automatic video anomaly detection system[J]. IEEE Transactions on Industrial Informatics, 2024, 20(2): 2843-2855. doi: 10.1109/TII.2023.3298476
    [26] WAN B Y, FANG Y M, XIA X, et al. Weakly supervised video anomaly detection via center-guided discriminative learning[C]//Proceedings of the 2020 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE Press, 2020: 1-6.
    [27] LV H, YUE Z Q, SUN Q R, et al. Unbiased multiple instance learning for weakly supervised video anomaly detection[C]//Proceedings of the 2023 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2023: 8022-8031.
  • 加载中
图(7) / 表(4)
计量
  • 文章访问数:  1287
  • HTML全文浏览量:  180
  • PDF下载量:  100
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-06-21
  • 录用日期:  2024-09-06
  • 网络出版日期:  2024-09-19
  • 整期出版日期:  2025-12-31

目录

    /

    返回文章
    返回
    常见问答