留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于异步卷积分解与分流结构的单阶段检测器

赵蓬辉 孟春宁 常胜江

赵蓬辉, 孟春宁, 常胜江等 . 基于异步卷积分解与分流结构的单阶段检测器[J]. 北京航空航天大学学报, 2019, 45(10): 2089-2098. doi: 10.13700/j.bh.1001-5965.2018.0564
引用本文: 赵蓬辉, 孟春宁, 常胜江等 . 基于异步卷积分解与分流结构的单阶段检测器[J]. 北京航空航天大学学报, 2019, 45(10): 2089-2098. doi: 10.13700/j.bh.1001-5965.2018.0564
ZHAO Penghui, MENG Chunning, CHANG Shengjianget al. Single shot multibox detector based on asynchronous convolution factorization and shunt structure[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 2089-2098. doi: 10.13700/j.bh.1001-5965.2018.0564(in Chinese)
Citation: ZHAO Penghui, MENG Chunning, CHANG Shengjianget al. Single shot multibox detector based on asynchronous convolution factorization and shunt structure[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 2089-2098. doi: 10.13700/j.bh.1001-5965.2018.0564(in Chinese)

基于异步卷积分解与分流结构的单阶段检测器

doi: 10.13700/j.bh.1001-5965.2018.0564
基金项目: 

公安部技术研究计划 2017JSYJC10

详细信息
    作者简介:

    赵蓬辉  男, 硕士研究生。主要研究方向:视频目标检测、深度学习

    孟春宁  男, 博士, 副教授。主要研究方向:图像处理、人工智能、信息安全

    常胜江  男, 博士, 教授, 博士生导师。主要研究方向:数字图像处理、太赫兹功能器件、深度学习

    通讯作者:

    孟春宁, E-mail:mengchunning123@163.com

  • 中图分类号: TP391.4

Single shot multibox detector based on asynchronous convolution factorization and shunt structure

Funds: 

Technology Research Project of Public Security Ministry 2017JSYJC10

More Information
  • 摘要:

    目标检测网络SSD的多层回归特征图存在各层回归计算之间相对独立的问题,且基于SSD改进的系列算法在提高检测精度的同时难以兼顾实时性。针对上述问题,提出一种基于异步卷积分解与分流(shunt)结构的单阶段目标检测器。基于异步卷积分解算法设计了一种shunt结构,交错连接多层特征图,增强了回归计算之间的统一性与协调性。优化了原有高层主流结构,在主流结构与shunt结构中分别用最大池化和异步卷积分解2种不同的方式对特征图大小进行降维,保留空间相关信息的同时提高了特征的多样性。实验结果表明,将VOC2007trainval和VOC2012trainval中的图片统一缩小至300像素×300像素进行训练,提出的目标检测器在VOC2007test上进行检测时的平均精度均值可达到80.5%,检测速度超过30帧/s。

     

  • 图 1  SSD算法计算流程

    Figure 1.  Computation procedure of SSD algorithm

    图 2  基于VGG前端的SSD网络结构

    Figure 2.  Network structure of SSD based on VGG front end

    图 3  FA-SSD网络结构

    Figure 3.  Network structure of FA-SSD

    图 4  异步卷积分解算法

    Figure 4.  Asynchronous convolution factorization algorithm

    图 5  回归特征图层间的shunt结构

    Figure 5.  Shunt structure between layers of regressive feature image

    图 6  高层网络局部结构

    Figure 6.  Local structure of high-level network

    图 7  数据增广

    Figure 7.  Data augmentation

    图 8  损失变化曲线

    Figure 8.  Loss variation curves

    图 9  不同迭代次数下的平均精度均值

    Figure 9.  Mean average precision under different numbers of iteration

    图 10  不同数目的shunt结构对检测的影响

    Figure 10.  Influence of different numbers of shunt structure on detection

    图 11  高层网络的优化

    Figure 11.  Optimization of high-level network

    图 12  高层结构优化对检测的影响

    Figure 12.  Influence of high-level structure optimization on detection

    图 13  FA-SSD300在VOC2007test上的部分检测结果

    Figure 13.  Partial detection results of FA-SSD300 on VOC2007test

    表  1  不同算法在VOC2007test上的检测结果

    Table  1.   Detection results of different algorithms on VOC2007test

    算法 训练数据 预训练 底层网络 图片大小 建议框数 显卡 速度/(帧·s-1) m_AP/%
    Fast R-CNN[8] 07+12 VGGNet 600×1 000* 300 K40 3.125 66.9
    Faster R-CNN[9] 07+12 VGGNet 600×1 000* 300 K40 5 73.2
    R-FCN[22] 07+12 VGGNet 600×1 000 300 K40 5.8 75.6
    YOLOv2[12] 07+12 Darknet-19 352×352 Titan X 81 73.7
    SSD300[13] 07+12 × VGGNet 300×300 8 732 Titan X 46 74.3
    SSD300[13] 07+12 VGGNet 300×300 8 732 Titan X 46 77.2
    SSD300*[13] 07+12 × VGGNet 300×300 8 732 1080Ti 43.5 74
    DSOD300[16] 07+12 × DS/64-192-48-1 300×300 8 732 Titan X 17.4 77.7
    DSSD321[14] 07+12 ResNet 321×321 17 080 Titan X 9.5 78.6
    FA-SSD300 07+12 × VGGNet 300×300 8 732 1080Ti 30 79.0
    FA-SSD300 07+12 VGGNet 300×300 8 732 1080Ti 30 80.5
    下载: 导出CSV

    表  2  针对VOC2007test具体类别的检测对比

    Table  2.   Comparison of specific category detections on VOC2007test

    类别 Fast R-CNN[8] Faster R-CNN[9] ION[22] R-FCN[23] MR-CNN[24] SSD300[13] DSSD321[14] FA-SSD300
    Aero 77.0 76.5 79.2 79.9 80.3 79.5 81.9 86.4
    Bike 78.1 79 83.1 87.2 84.1 83.9 84.9 85.9
    Bird 69.3 70.9 77.6 81.5 78.5 76 80.5 79.6
    Boat 59.4 65.5 65.6 72 70.8 69.6 68.4 73.3
    Bottle 38.3 52.1 54.9 69.8 68.5 50.4 53.9 53.6
    Bus 81.6 83.1 85.4 86.8 88 87 85.6 90.2
    Car 78.6 84.7 85.1 88.5 85.9 85.7 86.2 89.2
    Cat 86.7 86.4 87 89.8 87.8 88.1 88.9 91.7
    Chair 42.8 52 54.4 67 60.3 60.3 61.1 60.0
    Cow 78.8 81.9 80.6 88.1 85.2 81.5 83.5 84.3
    Table 68.9 65.7 73.8 74.5 73.7 77 78.7 80.9
    Dog 84.7 84.8 85.3 89.8 87.2 86.1 86.7 89.1
    Horse 82.0 84.6 82.2 90.6 86.5 87.5 88.7 87.4
    Mbike 76.6 77.5 82.2 79.9 85 83.97 86.7 86.5
    Person 69.9 76.7 74.4 81.2 76.4 79.4 79.7 83.3
    Plant 31.8 38.8 47.1 53.7 48.5 52.3 51.7 54.2
    Sheep 70.1 73.6 75.8 81.8 76.3 77.9 78 83.2
    Sofa 74.8 73.9 72.7 81.5 75.5 79.4 80.9 82.3
    Train 80.4 83 84.2 85.9 85 87.6 87.9 89.2
    Tv 70.4 72.6 80.4 79.9 81 76.8 79.4 78.5
    mAP/% 70.0 73.2 75.6 80.5 78.2 77.2 78.6 80.5
    下载: 导出CSV
  • [1] VIOLA P, JONES M.Rapid object detection using a boosted cascade of simple features[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2003: 511-518.
    [2] DALAL N, TRIGGS B.Histograms of oriented gradients for human detection[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2005: 886-893.
    [3] FELZENSZWALB P, MCALLESTER D, RAMANAN D.A discriminatively trained, multiscale, deformable part model[C]//IEEE Computer, Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2008: 1-8.
    [4] EVERINGHAM M, GOOL L V, WILLIAMS C K I, et al.The pascal, visual object classes (VOC) challenge[J].International Journal of Computer Vision, 2010, 88(2):303-338. doi: 10.2533-chimia.2011.925/
    [5] 李旭冬, 叶茂, 李涛.基于卷积神经网络的目标检测研究综述[J].计算机应用研究, 2017, 34(10):2881-2886. doi: 10.3969/j.issn.1001-3695.2017.10.001

    LI X D, YE M, LI T. Review of object detection based on convolutional neural networks[J].Application Research of Computers, 2017, 34(10):2881-2886(in Chinese). doi: 10.3969/j.issn.1001-3695.2017.10.001
    [6] GIRSHICK R, DONAHUE J, DARRELL T, et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2014: 580-587.
    [7] HE K, ZHANG X, REN S, et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014, 37(9):346-361.
    [8] GIRSHICK R.Fast R-CNN[C]//IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2015: 1440-1448.
    [9] REN S, HE K, GIRSHICK R, et al.Faster R-CNN: Towards real-time object detection with region proposal networks[C]//International Conference on Neural Information Processing Systems.Cambridge: MIT Press, 2015: 91-99.
    [10] LIN T Y, DOLLAR P, GIRSHICK R, et al.Feature pyramid networks for object detection[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2017: 936-944.
    [11] REDMON J, DIVVALA S, GIRSHICK R, et al.You only look once: Unified, real-time object detection[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2015: 779-788.
    [12] REDMON J, FARHADI A.YOLO9000: Better, faster, stronger[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2017: 6517-6525.
    [13] LIU W, ANGUELOV D, ERHAN D, et al.SSD: Single shot multibox detector[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 21-37.
    [14] REDMON J, FARHADI A.YOLOv3: An incremental improvement[EB/OL].(2018-04-08)[2018-09-21].http://cn.arxiv.org/pdf/1804.02767v1.
    [15] FU C Y, LIU W, RANGA A, et al.DSSD: Deconvolutional single shot detector[EB/OL].(2017-01-23)[2018-09-21].http://cn.arxiv.org/pdf/1701.06659.
    [16] SHEN Z, LIU Z, LI J, et al.DSOD: Learning deeply supervised object detectors from scratch[C]//IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2017: 1937-1945.
    [17] SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].(2015-03-10)[2018-09-21].http://cn.arxiv.org/pdf/1409.1556.
    [18] HE K, ZHANG X, REN S, et al.Deep residual learning for image recognition[C]//IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2015: 770-778.
    [19] SZEGEDY C, VANHOUCKE V, IOFFE S, et al.Rethinking the inception architecture for computer vision[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2016: 2818-2826.
    [20] IOFFE S, SZEGEDY C.Batch normalization: Accelerating deep network training by reducing internal covariate shift[EB/OL].(2015-03-02)[2018-09-26].https://arxiv.org/abs/1502.03167.
    [21] RUSSAKOVSKY O, DENG J, SU H, et al.ImageNet large scale visual recognition challenge[J].International Journal of Computer Vision, 2015, 115(3):211-252. http://d.old.wanfangdata.com.cn/NSTLHY/NSTL_HYCC0214533907/
    [22] BELL S, ZITNICK C L, BALA K, et al.Inside-outside Net: Detecting objects in context with skip pooling and recurrent neural networks[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2016: 2874-2883.
    [23] DAI J, LI Y, HE K, et al.R-FCN: Object detection via region-based fully convolutional networks[EB/OL].(2016-06-21)[2018-09-26].https://arxiv.org/abs/1605.06409.
    [24] HE K, GKIOXARI G, DOLLAR P, et al.Mask R-CNN[C]//IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2017: 1-13.
  • 加载中
图(13) / 表(2)
计量
  • 文章访问数:  576
  • HTML全文浏览量:  85
  • PDF下载量:  382
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-09-27
  • 录用日期:  2019-05-18
  • 网络出版日期:  2019-10-20

目录

    /

    返回文章
    返回
    常见问答