Citation: | ZHAO Penghui, MENG Chunning, CHANG Shengjianget al. Single shot multibox detector based on asynchronous convolution factorization and shunt structure[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 2089-2098. doi: 10.13700/j.bh.1001-5965.2018.0564(in Chinese) |
Single shot multibox detector (SSD) owns the relatively independent regression computations of multi-regressive feature maps, while the object detection algorithms based on SSD cannot make a tradeoff between detection accuracy and real-time speed. To solve the problems above, a single shot mutibox detector based on asynchronous convolution factorization and shunt structure (FA-SSD) is introduced based on asynchronous convolution factorization algorithm and shunt structure. The shunt structure, based on the proposed asynchronous convolution factorization algorithm, is designed to staggerly connect the layers of regression features, enhancing the unity and coordination between regression calculations. In order to optimize the mainstream of high-level structure, the asynchronous convolution factorization algorithm and max pooling are implemented to reduce the dimension of image features in the mainstream and shunt respectively, which can hold the spatial information while improving the diversity of features. According to the experimental results from VOC2007test, FA-SSD achieves a mean average precision of 80.5% after the training of VOC2007trainval and VOC2012trainval with nominal resolution of 300×300, while the detection speed exceeds 30 frames per second.
[1] |
VIOLA P, JONES M.Rapid object detection using a boosted cascade of simple features[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2003: 511-518.
|
[2] |
DALAL N, TRIGGS B.Histograms of oriented gradients for human detection[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2005: 886-893.
|
[3] |
FELZENSZWALB P, MCALLESTER D, RAMANAN D.A discriminatively trained, multiscale, deformable part model[C]//IEEE Computer, Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2008: 1-8.
|
[4] |
EVERINGHAM M, GOOL L V, WILLIAMS C K I, et al.The pascal, visual object classes (VOC) challenge[J].International Journal of Computer Vision, 2010, 88(2):303-338. doi: 10.2533-chimia.2011.925/
|
[5] |
李旭冬, 叶茂, 李涛.基于卷积神经网络的目标检测研究综述[J].计算机应用研究, 2017, 34(10):2881-2886. doi: 10.3969/j.issn.1001-3695.2017.10.001
LI X D, YE M, LI T. Review of object detection based on convolutional neural networks[J].Application Research of Computers, 2017, 34(10):2881-2886(in Chinese). doi: 10.3969/j.issn.1001-3695.2017.10.001
|
[6] |
GIRSHICK R, DONAHUE J, DARRELL T, et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2014: 580-587.
|
[7] |
HE K, ZHANG X, REN S, et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014, 37(9):346-361.
|
[8] |
GIRSHICK R.Fast R-CNN[C]//IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2015: 1440-1448.
|
[9] |
REN S, HE K, GIRSHICK R, et al.Faster R-CNN: Towards real-time object detection with region proposal networks[C]//International Conference on Neural Information Processing Systems.Cambridge: MIT Press, 2015: 91-99.
|
[10] |
LIN T Y, DOLLAR P, GIRSHICK R, et al.Feature pyramid networks for object detection[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2017: 936-944.
|
[11] |
REDMON J, DIVVALA S, GIRSHICK R, et al.You only look once: Unified, real-time object detection[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2015: 779-788.
|
[12] |
REDMON J, FARHADI A.YOLO9000: Better, faster, stronger[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2017: 6517-6525.
|
[13] |
LIU W, ANGUELOV D, ERHAN D, et al.SSD: Single shot multibox detector[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 21-37.
|
[14] |
REDMON J, FARHADI A.YOLOv3: An incremental improvement[EB/OL].(2018-04-08)[2018-09-21].http://cn.arxiv.org/pdf/1804.02767v1.
|
[15] |
FU C Y, LIU W, RANGA A, et al.DSSD: Deconvolutional single shot detector[EB/OL].(2017-01-23)[2018-09-21].http://cn.arxiv.org/pdf/1701.06659.
|
[16] |
SHEN Z, LIU Z, LI J, et al.DSOD: Learning deeply supervised object detectors from scratch[C]//IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2017: 1937-1945.
|
[17] |
SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].(2015-03-10)[2018-09-21].http://cn.arxiv.org/pdf/1409.1556.
|
[18] |
HE K, ZHANG X, REN S, et al.Deep residual learning for image recognition[C]//IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2015: 770-778.
|
[19] |
SZEGEDY C, VANHOUCKE V, IOFFE S, et al.Rethinking the inception architecture for computer vision[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2016: 2818-2826.
|
[20] |
IOFFE S, SZEGEDY C.Batch normalization: Accelerating deep network training by reducing internal covariate shift[EB/OL].(2015-03-02)[2018-09-26].https://arxiv.org/abs/1502.03167.
|
[21] |
RUSSAKOVSKY O, DENG J, SU H, et al.ImageNet large scale visual recognition challenge[J].International Journal of Computer Vision, 2015, 115(3):211-252. http://d.old.wanfangdata.com.cn/NSTLHY/NSTL_HYCC0214533907/
|
[22] |
BELL S, ZITNICK C L, BALA K, et al.Inside-outside Net: Detecting objects in context with skip pooling and recurrent neural networks[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2016: 2874-2883.
|
[23] |
DAI J, LI Y, HE K, et al.R-FCN: Object detection via region-based fully convolutional networks[EB/OL].(2016-06-21)[2018-09-26].https://arxiv.org/abs/1605.06409.
|
[24] |
HE K, GKIOXARI G, DOLLAR P, et al.Mask R-CNN[C]//IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2017: 1-13.
|