留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种迭代聚合的高分辨率网络Anchor-free目标检测方法

王新 李喆 张宏立

王新, 李喆, 张宏立等 . 一种迭代聚合的高分辨率网络Anchor-free目标检测方法[J]. 北京航空航天大学学报, 2021, 47(12): 2533-2541. doi: 10.13700/j.bh.1001-5965.2020.0484
引用本文: 王新, 李喆, 张宏立等 . 一种迭代聚合的高分辨率网络Anchor-free目标检测方法[J]. 北京航空航天大学学报, 2021, 47(12): 2533-2541. doi: 10.13700/j.bh.1001-5965.2020.0484
WANG Xin, LI Zhe, ZHANG Hongliet al. High-resolution network Anchor-free object detection method based on iterative aggregation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(12): 2533-2541. doi: 10.13700/j.bh.1001-5965.2020.0484(in Chinese)
Citation: WANG Xin, LI Zhe, ZHANG Hongliet al. High-resolution network Anchor-free object detection method based on iterative aggregation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(12): 2533-2541. doi: 10.13700/j.bh.1001-5965.2020.0484(in Chinese)

一种迭代聚合的高分辨率网络Anchor-free目标检测方法

doi: 10.13700/j.bh.1001-5965.2020.0484
基金项目: 

国家自然科学基金 51767022

国家自然科学基金 51967019

详细信息
    通讯作者:

    李喆, E-mail: zhlxju@163.com

  • 中图分类号: V221+.3;TB553

High-resolution network Anchor-free object detection method based on iterative aggregation

Funds: 

National Natural Science Foundation of China 51767022

National Natural Science Foundation of China 51967019

More Information
  • 摘要:

    针对目前Anchor-free目标检测方法CenterNet(ObjectsasPoints)生成热力图不准确、检测精度不足的问题,提出了一种基于特征迭代聚合的高分辨率表征网络CenterNet-DHRNet。首先,引入高分辨率表征骨干网络,并用迭代聚合的方式对不同分辨率的特征图进行融合,提高网络的分辨率,有效减少图像在下采样过程中损失的空间语义信息。其次,使用高效通道注意力机制对高分辨率表征骨干网络的输出进行优化。最后,利用结合空洞卷积的空间金字塔池化操作增强网络对不同尺度物体的感受野。实验在PASCALVOC数据集和KITTI数据集上进行,结果表明:CenterNet-DHRNet精度更高,满足实时检测的性能要求,具有良好的鲁棒性。

     

  • 图 1  标注示意图

    Figure 1.  Schematic diagram of annotation

    图 2  CenterNet总体结构

    Figure 2.  Architecture of CenterNet

    图 3  原始HRNet的特征融合方式

    Figure 3.  Feature fusion method of original HRNet

    图 4  迭代聚合的特征融合方式

    Figure 4.  Feature fusion method of iterative aggregation

    图 5  高效通道注意力机制

    Figure 5.  Efficient channel attention network

    图 6  Espp结构

    Figure 6.  Structure of Espp

    图 7  CenterNet-DHRNet方法结构

    Figure 7.  Structure of proposed CenterNet-DHRNet algorithm

    图 8  Loss曲线对比

    Figure 8.  Comparison of loss curve

    图 9  CenterNet-DHRNet与原CenterNet在PASCAL VOC数据集上结果对比

    Figure 9.  Comparison of results between CenterNet-DHRNet and original CenterNet on PASCAL VOC dataset

    图 10  CenterNet-DHRNet在KITTI数据集上的检测结果

    Figure 10.  Detection results of CenterNet-DHRNet on KITTI dataset

    表  1  PASCAL VOC数据集测试结果

    Table  1.   Test results of PASCAL VOC dataset

    方法 图像分辨率 mAP/% FPS
    FasterR-CNN-Res101[5] 600×1 000 76.4 5
    FasterR-CNN-Vgg16[5] 600×1 000 73.2 7
    R-FCN[23] 600×1 000 80.5 9
    Yolov3 544×544 79.3 26
    SSD300[7] 300×300 77.2 45
    SSD500[7] 513×513 78.9 19
    FCOS[9] 800×800 80.2 16
    ExtremeNet[10] 512×512 79.5 3.1
    DSSD[22] 513×513 81.5 5.5
    CenterNet-Res18[11] 512×512 75.7 96*
    CenterNet-Res101[11] 512×512 78.7 27*
    CenterNet-DLA[11] 512×512 80.7 30*
    CenterNet-HRNet 512×512 79.0 21
    本文 512×512 81.9 18
    注:“*”表示在本机环境下的运行结果。
    下载: 导出CSV

    表  2  PASCAL VOC数据集上训练时间、模型复杂度对比

    Table  2.   Comparison of training time and model complexity on PASCAL VOC dataset

    方法 GPUs 训练时间/h 模型大小/Mbit
    CenterNet 2×TITAN V 15 80.8
    本文 1×1 080TI 55 189.1
    下载: 导出CSV

    表  3  不同方法在PASCAL VOC数据集上每个类别的AP比较

    Table  3.   AP comparison of different algorithms for each category on PASCAL VOC dataset

    类别 本文 CenterNet-DLA[11] CenterNet-HRNet Faster R-CNN[5] Mask R-CNN R-FCN[23] SSD300[7]
    mAP 81.9 80.7 79.0 73.2 78.2 80.5 77.2
    飞机 86.2 85.0 86.1 76.5 80.3 79.9 77.7
    自行车 88.6 86.0 83.9 79.0 84.1 87.2 84.0
    82.4 81.4 77.1 70.9 78.5 81.5 76.3
    72.8 72.8 70.2 65.5 70.8 72.0 71.3
    杯子 73.4 68.4 71.0 52.1 68.5 69.8 48.6
    公共汽车 86.6 86.0 84.6 83.1 88.0 86.8 85.3
    小轿车 88.8 88.4 88.1 84.7 85.9 88.5 86.3
    87.3 86.5 84.6 86.4 87.8 89.8 87.3
    椅子 68.1 65.0 65.63 52.0 60.3 67.0 59.3
    86.9 86.3 84.9 81.9 85.2 88.1 81.8
    餐桌 78.4 77.6 74.3 65.7 73.7 74.5 77.2
    84.6 85.2 82.6 84.8 87.2 89.8 85.2
    88.5 87.0 86.0 84.6 86.5 90.6 87.0
    摩托车 86.5 86.1 82.6 77.5 85.0 79.9 83.7
    86.0 85.0 84.3 76.7 76.4 81.2 79.2
    盆栽植物 59.0 58.1 55.4 38.8 48.5 53.7 53.0
    85.3 83.4 82.3 73.6 76.3 81.8 76.3
    沙发 81.5 79.6 73.1 73.9 75.5 81.5 79.8
    火车 87.5 85.0 84.0 83.0 85.0 85.9 88.3
    电视 80.2 80.3 79.4 72.6 81.0 79.9 76.2
    下载: 导出CSV

    表  4  在PASCAL VOC数据集上的消融实验

    Table  4.   Ablation experiment on PASCAL VOC dataset

    HRNet 迭代聚合 ECANet Espp mAP/%
    79.0
    80.9
    81.6
    81.9
    下载: 导出CSV

    表  5  KITTI数据集上不同目标检测方法对比

    Table  5.   Comparison of different object detection algorithms on KITTI dataset

    方法 输入 FPS mAP%
    Yolov3 544 26 84.9
    CenterNet 512 30 86.1
    本文 512 18 87.1
    下载: 导出CSV

    表  6  KITTI数据集上3类目标AP对比

    Table  6.   Comparison of three types of object AP on KITTI dataset

    方法 AP/% mAP/%
    Car Pedestrian Cyclist
    CenterNet 95.4 78.3 84.6 86.1
    本文 96.7 79.1 85.5 87.1
    下载: 导出CSV
  • [1] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. doi: 10.1023%2FB%3AVISI.0000029664.99615.94.pdf
    [2] 蒋弘毅, 王永娟. 目标检测模型及其优化方法综述[J]. 自动化学报, 2021, 47(6): 1232-1255. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202106004.htm

    JIANG H Y, WANG Y J. A survey of object detection models and its optimization methods[J]. Acta Automatica Sinica, 2021, 47(6): 1232-1255(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202106004.htm
    [3] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2014: 580-587.
    [4] GIRSHICK R. Fast R-CNN[C]//International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1440-1448.
    [5] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031
    [6] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2016: 779-788.
    [7] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.
    [8] LAW H, DENG J. CornerNet: Detecting objects as paired keypoints[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 765-781.
    [9] TIAN Z, SHEN C, CHEN H, et al. FCOS: Fully convolutional one-stage object detection[C]//International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 9626-9635.
    [10] ZHOU X, ZHUO J, KRAHENBUHL P, et al. Bottom-up object detection by grouping extreme and center points[EB/OL]. (2019-04-25)[2020-08-20]. https://arxiv.org/abs/1901.08043.
    [11] ZHOU X, WANG D, KRAHENBUHL P, et al. Objects as points[EB/OL]. (2019-04-25)[2020-08-20]. https://arxiv.org/abs/1904.07850.
    [12] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2016: 770-778.
    [13] XIAO B, WU H, WEI Y, et al. Simple baselines for human pose estimation and tracking[EB/OL]. (2018-08-21)[2020-08-20]. https://arxiv.org/abs/1804.06208v2.
    [14] YU F, WANG D, SHELHAMER E, et al. Deep layer aggregation[EB/OL]. (2019-01-04)[2020-08-20]. https://arxiv.org/abs/1707.06484.
    [15] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2020-08-20]. https://arxiv.org/abs/1409.1556.
    [16] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[EB/OL]. [2020-08-20]. https://arxiv.org/abs/1902.09212.
    [17] WANG Q, WU B, ZHU P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2019: 11534-11542.
    [18] HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2019: 7132-7141.
    [19] CHEN L, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[EB/OL]. (2018-03-08)[2020-08-20]. https://arxiv.org/abs/1802.02611v2.
    [20] YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[EB/OL]. (2016-04-30)[2020-08-20]. https://arxiv.org/abs/1511.07122.
    [21] LIN T, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[EB/OL]. (2018-03-07)[2020-08-20]. https://arxiv.org/abs/1708.02002.
    [22] FU C, LIU W, RANGA A, et al. DSSD: Deconvolutional single shot detector[EB/OL]. [2020-08-20]. https://arxiv.org/abs/1701.06659.
    [23] DAI J, LI Y, HE K, et al. R-FCN: Object detection via region-based fully convolutional networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems December. New York: ACM, 2016: 379-387.
  • 加载中
图(10) / 表(6)
计量
  • 文章访问数:  338
  • HTML全文浏览量:  99
  • PDF下载量:  68
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-09-01
  • 录用日期:  2020-10-30
  • 网络出版日期:  2021-12-20

目录

    /

    返回文章
    返回
    常见问答