北京航空航天大学学报 ›› 2020, Vol. 46 ›› Issue (9): 1786-1796.doi: 10.13700/j.bh.1001-5965.2020.0069

• 论文 • 上一篇    下一篇

基于跨尺度特征聚合网络的多尺度行人检测

曹帅, 张晓伟, 马健伟   

  1. 青岛大学 计算机科学技术学院, 青岛 266071
  • 收稿日期:2020-03-02 发布日期:2020-09-22
  • 通讯作者: 张晓伟 E-mail:xiaowei19870119@sina.com
  • 作者简介:曹帅 男,硕士研究生。主要研究方向:行人检测和计算机视觉;张晓伟 男,博士,讲师。主要研究方向:图像/视频分析和理解、计算机视觉和机器学习。
  • 基金资助:
    国家自然科学基金(61902204);山东省自然科学基金(ZR2019BF028)

Trans-scale feature aggregation network for multiscale pedestrian detection

CAO Shuai, ZHANG Xiaowei, MA Jianwei   

  1. College of Computer Science and Technology, Qingdao University, Qingdao 266071, China
  • Received:2020-03-02 Published:2020-09-22
  • Supported by:
    National Natural Science Foundation of China (61902204); Natural Science Foundation of Shandong Province of China (ZR2019BF028)

摘要: 行人的空间尺度差异是影响行人检测性能的主要瓶颈之一。针对这一问题,提出了跨尺度特征聚合网络(TS-FAN)有效检测多尺度行人。首先,鉴于不同尺度空间呈现出的特征差异性,引入一种基于多路径区域建议网络(RPN)的尺度补偿策略,其在多尺度卷积特征层上自适应地生成一系列与其感受野大小相对应的候选目标尺度集。其次,考虑到不同层次卷积特征在视觉语义上的互补性,提出了跨尺度特征聚合网络模块,其通过横向连接、自上而下路径和由底向上路径,有效地聚合具有语义鲁棒性的高层特征和具有精确定位信息的低层特征,实现对卷积层特征的增强表示。最后,联合多路径RPN尺度补偿策略和跨尺度特征聚合网络模块,构建了一种尺度自适应感知的多尺度行人检测网络。实验结果表明,所提方法与当前一流的行人检测方法TLL-TFA相比,在整个Caltech公开测试数据集上(All:行人高度大于20像素)的行人漏检率降低到26.21%(提高了11.94%),尤其对于Caltech小尺寸行人子数据集上(Far:行人高度在20~30像素之间)的行人漏检率降低到47.30%(提高了12.79%),同时在尺度变化剧烈的ETH数据集上的效果也取得显著提升。

关键词: 行人检测, 尺度感知, 特征金字塔, 特征聚合, 非极大值抑制

Abstract: Space scale variation of pedestrian instance is one of the main bottlenecks affecting pedestrian detection performance. For this issue, a Trans-Scale Feature Aggregation Network (TS-FAN) is proposed to effectively deal with multi-scale pedestrian detection. First, in view of the feature differences among different scale spaces, we introduce a scale compensation strategy based on multi-path Region Proposal Network (RPN). According to the effectiveness of the convolutional feature layers of different scales, a series of candidate regional scale sets are generated adaptively from the feature maps corresponding to the size of the receptive field. Second, considering the semantic complementarity of convolutional features at different levels, a trans-scale feature aggregation module is proposed to effectively aggregate with semantic robustness highllevel features and with accurate location information of low-level features and achieve enhanced representation ability of convolutional features, by aggregating horizontal connection, top-down path and bottom-up path. Finally, combining the multi-path RPN scale compensation strategy and trans-scale feature aggregation module, we construct a multi-scale pedestrian detection network by adaptive scale perception. The experimental results show that, compared with the state-of-the-art method TLL-TFA, the log-average miss rate of pedestrian detection on widely-used Caltech dataset is reduced to 26.21% (increased by 11.94%) for whole-scale pedestrians (above 20 pixel in height), and 47.30% (increased by 12.79%) for small-scale pedestrian (between 20-30 pixels in height). And the similar improvement is also achieved on ETH dataset with drastic scale variation.

Key words: pedestrian detection, scale perception, feature pyramid, feature aggregation, non-maximum suppression

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发