留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

运动信息引导的目标检测算法

胡海苗 沈柳青 高立崑 李明竹

胡海苗, 沈柳青, 高立崑, 等 . 运动信息引导的目标检测算法[J]. 北京航空航天大学学报, 2022, 48(9): 1710-1720. doi: 10.13700/j.bh.1001-5965.2022.0291
引用本文: 胡海苗, 沈柳青, 高立崑, 等 . 运动信息引导的目标检测算法[J]. 北京航空航天大学学报, 2022, 48(9): 1710-1720. doi: 10.13700/j.bh.1001-5965.2022.0291
HU Haimiao, SHEN Liuqing, GAO Likun, et al. Object detection algorithm guided by motion information[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(9): 1710-1720. doi: 10.13700/j.bh.1001-5965.2022.0291(in Chinese)
Citation: HU Haimiao, SHEN Liuqing, GAO Likun, et al. Object detection algorithm guided by motion information[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(9): 1710-1720. doi: 10.13700/j.bh.1001-5965.2022.0291(in Chinese)

运动信息引导的目标检测算法

doi: 10.13700/j.bh.1001-5965.2022.0291
基金项目: 

国家自然科学基金 62122011

国家自然科学基金 U21A20514

浙江省重点研发项目 2022C01082

详细信息
    通讯作者:

    胡海苗, E-mail: frank0139@163.com

  • 中图分类号: TP391

Object detection algorithm guided by motion information

Funds: 

National Natural Science Foundation of China 62122011

National Natural Science Foundation of China U21A20514

Key Research and Development Program of Zhejiang Province 2022C01082

More Information
  • 摘要:

    在室外监控视频的场景下,由于场景的复杂性及目标的多样性,监控视频中的目标存在难以检测的情况,如目标被遮挡、目标尺寸变化等,目标检测任务仍然存在挑战。基于此,提出了一种利用运动信息引导基于卷积神经网络的目标检测算法来提高目标检测的准确率。对运动目标检测算法进行一定的改进,使得到的运动前景图中能够保持静止目标前景的存在;利用运动前景图中的前景可以指示目标空间位置的特点,在特征层面将网络提取的特征图与获取的以运动前景图为主的运动信息相融合,提高特征图可能存在目标区域的响应值;在目标检测算法的检测器中,引入一个定位分支,利用视频帧的运动前景图,学习候选目标的定位置信度,并与目标的分类置信度加权求和,作为目标最终的置信度,再通过非极大值抑制方法得到检测结果。实验证明,在固定摄像机下采集的数据集中,所提算法能够提升目标检测的准确率。

     

  • 图 1  本文方法网络结构

    Figure 1.  Network structure of the proposed method

    图 2  运动前景图结果比较

    Figure 2.  Comparison of foreground map results

    图 3  基于运动信息的多尺度特征融合模块网络结构

    Figure 3.  Network structure of multi-scale feature fusion module based on motion information

    图 4  多尺度特征融合模块示意图

    Figure 4.  Schematic diagram of multi-scale feature fusion module

    图 5  引入定位分支后的检测器结构

    Figure 5.  Detection head structure after introducing localization branch

    图 6  DML_det数据集基线方法结果

    Figure 6.  Baseline method results of DML_det dataset

    图 7  DML_det数据集本文算法结果

    Figure 7.  The proposed algorithm results of DML_det dataset

    图 8  DukeMTMC数据集基线方法结果

    Figure 8.  Baseline method results of DukeMTMC dataset

    图 9  DukeMTMC数据集本文算法结果

    Figure 9.  The proposed algorithm results of DukeMTMC dataset

    图 10  PETS09数据集基线方法结果

    Figure 10.  Baseline method results of PETS09 dataset

    图 11  PETS09数据集本文算法结果

    Figure 11.  The proposed algorithm results of PETS09 dataset

    表  1  DML_det数据集上本文算法与其他目标检测算法对比

    Table  1.   Comparison of the proposed algorithm with other object detection algorithms on DML_det dataset  %

    方法 AP@[0.5:0.95] Recall@[0.5:0.95]
    TOOD[29] 36.3 49.5
    Dynamic R-CNN[30] 36.1 46.5
    ATSS[31] 36.8 48.4
    VarifocalNet[32] 37.2 48.2
    YOLOF[33] 19.0 33.2
    Sparse R-CNN[34] 24.0 44.7
    YOLOX[35] 25.1 36.7
    FoveaBox[36] 25.8 38.5
    Double-Head R-CNN[37] 33.7 41.3
    NAS-FCOS[38] 35.5 46.1
    本文方法 43.6 57.7
    下载: 导出CSV

    表  2  DukeMTMC数据集上本文算法与其他目标检测算法对比

    Table  2.   Comparison of the proposed algorithm with other object detection algorithms on DukeMTMC dataset  %

    方法 AP@[0.5:0.95] Recall@[0.5:0.95]
    TOOD[29] 59.0 64.9
    Dynamic R-CNN[30] 58.1 63.5
    ATSS[31] 56.3 63.1
    VarifocalNet[32] 56.5 63.0
    YOLOF[33] 40.0 52.4
    Sparse R-CNN[34] 45.0 63.4
    YOLOX[35] 44.6 53.2
    FoveaBox[36] 54.6 60.8
    Double-Head R-CNN[37] 55.7 60.5
    NAS-FCOS[38] 56.8 63.6
    本文方法 62.0 71.0
    下载: 导出CSV

    表  3  PETS09数据集上本文算法与其他目标检测算法对比

    Table  3.   Comparison of the proposed algorithm with other object detection algorithms on PETS09 dataset  %

    方法 AP@[0.5:0.95] Recall@[0.5:0.95]
    TOOD[29] 38.9 53.8
    Dynamic R-CNN[30] 40.4 52.8
    ATSS[31] 38.9 54.6
    VarifocalNet[32] 39.6 55.3
    YOLOF[33] 36.2 50.4
    Deformable DETR[39] 29.5 45.3
    YOLOX[35] 36.0 47.8
    FoveaBox[36] 36.3 52.3
    Double-Head R-CNN[37] 38.5 50.0
    NAS-FCOS[38] 40.7 56.0
    本文方法 42.3 57.9
    下载: 导出CSV

    表  4  DML_det数据集性能评价指标

    Table  4.   Performance evaluation indexes of DML_det dataset

    方法 评价指标/%
    Cascade R-CNN 改进ViBe算法 多尺度特征融合 定位分支 AP@[0.5:0.95] Recall@[0.5:0.95]
    35.5 52.5
    40.1 56.4
    40.7 57.8
    41.9 58.5
    43.6 57.7
    下载: 导出CSV

    表  5  DukeMTMC数据集性能评价指标

    Table  5.   Performance evaluation indexes of DukeMTMC dataset

    方法 评价指标/%
    Cascade R-CNN 改进ViBe算法 多尺度特征融合 定位分支 AP@[0.5:0.95] Recall@[0.5:0.95]
    58.1 68.2
    60.4 69.3
    60.2 70.3
    61.2 70.7
    62.0 71.0
    下载: 导出CSV

    表  6  PETS09数据集性能评价指标

    Table  6.   Performance evaluation indexes of PETS09 dataset

    方法 评价指标/%
    Cascade R-CNN 改进ViBe算法 多尺度特征融合 定位分支 AP@[0.5:0.95] Recall@[0.5:0.95]
    38.2 54.8
    40.0 57.2
    39.9 57.3
    41.2 56.7
    42.3 57.9
    下载: 导出CSV
  • [1] DOLLAR P, WOJEK C, SCHIELE B, et al. Pedestrian detection: An evaluation of the state of the art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 34(4): 743-761.
    [2] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2005, 1: 886-893.
    [3] AHONEN T, HADID A, PIETIKÄINEN M. Face recognition with local binary patterns[C]//European Conference on Computer Vision. Berlin: Springer, 2004: 469-481.
    [4] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. doi: 10.1023/B:VISI.0000029664.99615.94
    [5] SINGLA N. Motion detection based on frame difference method[J]. International Journal of Information & Computation Technology, 2014, 4(15): 1559-1565.
    [6] ZIVKOVIC Z. Improved adaptive Gaussian mixture model for background subtraction[C]//Proceedings of the 17th International Conference on Pattern Recognition. Piscataway: IEEE Press, 2004, 2: 28-31.
    [7] BARNICH O, VAN DROOGENBROECK M. ViBe: A universal background subtraction algorithm for video sequences[J]. IEEE Transactions on Image Processing, 2010, 20(6): 1709-1724.
    [8] PICCARDI M. Background subtraction techniques: A review[C]//2004 IEEE International Conference on Systems, Man and Cybernetics. Piscataway: IEEE Press, 2004, 4: 3099-3104.
    [9] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.
    [10] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 779-788.
    [11] REDMON J, FARHADI A. YOLO9000: Better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 7263-7271.
    [12] REDMON J, FARHADI A. YOLOv3: An incremental improvement[EB/OL]. (2018-04-08)[2022-04-08]. https://arxiv.org/abs/1804.02767v1.
    [13] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: Optimal speed and accuracy of object detection[J]. Artificial Intelligence, 2020, 4: 1-17.
    [14] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2980-2988.
    [15] GIRSHICK R, DONAHUE J, DARRELL T, et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(1): 142-158.
    [16] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1440-1448.
    [17] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. Advances in Neural Information Processing Systems, 2017, 39(6): 1137-1149.
    [18] CAI Z, VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6154-6162.
    [19] ZHU X, XIONG Y, DAI J, et al. Deep feature flow for video recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 2349-2358.
    [20] ZHU X, WANG Y, DAI J, et al. Flow-guided feature aggregation for video object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 408-417.
    [21] REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 658-666.
    [22] JIANG B, LUO R, MAO J, et al. Acquisition of localization confidence for accurate object detection[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 784-799.
    [23] WU S, LI X, WANG X. IoU-aware single-stage object detector for accurate localization[J]. Image and Vision Computing, 2020, 97: 103911.
    [24] NEUBECK A, VAN GOOL L. Efficient non-maximum suppression[C]//Proceedings of the 18th International Conference on Pattern Recognition. Piscataway: IEEE Press, 2006, 3: 850-855.
    [25] WANG X, HU H M, ZHANG Y. Pedestrian detection based on spatial attention module for outdoor video surveillance[C]//2019 IEEE 15th International Conference on Multimedia Big Data (BigMM). Piscataway: IEEE Press, 2019: 247-251.
    [26] ZHANG Z, WU J, ZHANG X, et al. Multi-target, multi-camera tracking by hierarchical clustering: Recent progress on dukemtmc project[EB/OL]. (2017-11-27)[2022-04-08]. https://arxiv.org/abs/1712.09531.
    [27] FERRYMAN J, SHAHROKNI A. PETS2009: Dataset and challenge[C]//2009 12th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance. Piscataway: IEEE Press, 2009: 1-6.
    [28] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]//European Conference on Computer Vision. Berlin: Springer, 2014: 740-755.
    [29] FENG C, ZHONG Y, GAO Y, et al. TOOD: Task-aligned one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 3490-3499.
    [30] ZHANG H, CHANG H, MA B, et al. Dynamic R-CNN: Towards high quality object detection via dynamic training[C]//European Conference on Computer Vision. Berlin: Springer, 2020: 260-275.
    [31] ZHANG S, CHI C, YAO Y, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 9759-9768.
    [32] ZHANG H, WANG Y, DAYOUB F, et al. VarifocalNet: An iou-aware dense object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 8514-8523.
    [33] CHEN Q, WANG Y, YANG T, et al. You only look one-level feature[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 13039-13048.
    [34] SUN P, ZHANG R, JIANG Y, et al. Sparse R-CNN: End-to-end object detection with learnable proposals[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 14454-14463.
    [35] GE Z, LIU S, WANG F, et al. YOLOX: Exceeding YOLO series in 2021[EB/OL]. (2021-08-06)[2022-04-08]. https://arxiv.org/abs/2107.08430.
    [36] KONG T, SUN F, LIU H, et al. FoveaBox: Beyound anchor-based object detection[J]. IEEE Transactions on Image Processing, 2020, 29: 7389-7398.
    [37] WU Y, CHEN Y, YUAN L, et al. Rethinking classification and localization for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 10186-10195.
    [38] WANG N, GAO Y, CHEN H, et al. NAS-FCOS: Fast neural architecture search for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 11943-11951.
    [39] ZHU X, SU W, LU L, et al. Deformable DETR: Deformable transformers for end-to-end object detection[EB/OL]. (2021-03-18)[2022-04-08]. https://arxiv.org/abs/2010.04159v1.
  • 加载中
图(11) / 表(6)
计量
  • 文章访问数:  312
  • HTML全文浏览量:  150
  • PDF下载量:  52
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-04-28
  • 录用日期:  2022-05-13
  • 网络出版日期:  2022-05-17

目录

    /

    返回文章
    返回
    常见问答