基于双注意力混洗的无人机航拍目标跟踪算法

金国栋; 薛远亮; 谭力宁; 许剑锟

doi:10.13700/j.bh.1001-5965.2021.0177

基于双注意力混洗的无人机航拍目标跟踪算法

doi: 10.13700/j.bh.1001-5965.2021.0177

火箭军工程大学核工程学院，西安 710025

基金项目: 国家自然科学基金 (61673017,61403398)

详细信息

作者简介:
金国栋等：基于双注意力混洗的多尺度无人机航拍目标跟踪算法 13

通讯作者:
E-mail：641797825@qq.com

中图分类号: V279；TP391
计量
- 文章访问数: 357
- HTML全文浏览量: 107
- PDF下载量: 63
- 被引次数: 0
出版历程
- 收稿日期: 2021-04-07
- 录用日期: 2021-05-28
- 网络出版日期: 2021-06-16
- 整期出版日期: 2023-01-30

Aerial object tracking algorithm for UAVs based on dual-attention shuffling

School of Nuclear Engineering，Rocket Force University of Engineering，Xi’an 710025，China

Funds: National Natural Science Foundation of China (61673017,61403398)

More Information

Corresponding author: E-mail：641797825@qq.com

摘要

摘要:
针对无人机（UAV）跟踪过程中目标经常出现尺寸小、尺度变化大和相似物干扰等问题，提出了一种基于双注意力混洗的多尺度无人机实时跟踪算法。考虑到无人机视角下目标像素点少，构建了双采样融合的深层网络，既提供了语义信息丰富的深度特征,又保留了目标的细节信息；设计了双注意力混洗模块，通道注意力和空间注意力同时分组筛选提取到的特征信息，混洗不同通道间的信息，加强信息交流，提高了算法辨别能力；为利用不同层的特征信息，加入多个区域建议网络完成目标的分类和回归，并针对无人机的目标特点，将结果进行加权融合。实验结果表明：所提算法在数据集上的成功率和准确率分别为60.3%和79.3%，速度为37.5 帧/s。所提算法的辨别能力和多尺度适应能力明显增强，能有效应对无人机跟踪中常见的挑战。
- 无人机 /
- 目标跟踪 /
- 注意力模块 /
- 混洗 /
- 区域建议网络
Abstract:
A multi-scale real-time tracking algorithm for unmanned aerial vehicle (UAV) based on dual-attention shuffling is proposed to solve the problems of small size, large scale variation and similar object interference which often occur during UAV object tracking. First, considering the small number of target pixels in the UAV view, a deep network with double sampling integration is constructed, which provides semantic information-rich depth features and preserves the target’s detailed information. Next, a dual-attention shuffling module is designed. Channel attention and spatial attention are simultaneously grouped to filter the extracted feature information, and then the information between different channels is shuffled to enhance information exchange and improve the discriminative ability of the algorithm. Finally, to utilize the feature information of different layers, multiple region proposal networks are added to complete the target classification and regression, and the results are weighted and fused for the UAV target characteristics. Results show that the success and precision rates of the algorithm are 60.3% and 79.3% on the dataset, respectively, with 37.5 frame/s. The algorithm discrimination ability and multi-scale adaptation are significantly enhanced, which can effectively deal with the common challenges in UAV tracking.
- unmanned aerial vehicle /
- object tracking /
- attention module /
- shuffle /
- region proposal network

HTML全文

图 1 残差模块

Figure 1. ResBlock

下载: 全尺寸图片幻灯片

图 2 本文算法网络结构

Figure 2. Structure of the proposed algorithm network

下载: 全尺寸图片幻灯片

图 3 投影映射的下采样

Figure 3. Down-sampling of projection mapping

下载: 全尺寸图片幻灯片

图 4 步长为2、大小为1的卷积过程

Figure 4. Convolution process with a step of 2 and size of 1

下载: 全尺寸图片幻灯片

图 5 双注意力混洗

Figure 5. Dual-attention shuffling

下载: 全尺寸图片幻灯片

图 6 通道混洗

Figure 6. Channel shuffle

下载: 全尺寸图片幻灯片

图 7 Grad-CAM热力图

Figure 7. Heatmap of Grad-CAM

下载: 全尺寸图片幻灯片

图 8 RPN模块

Figure 8. Region proposal network module

下载: 全尺寸图片幻灯片

图 9 算法实现步骤

Figure 9. Algorithm flowchart

下载: 全尺寸图片幻灯片

图 10 部分跟踪结果

Figure 10. Some tracking results

下载: 全尺寸图片幻灯片

图 11 不同算法在UAV123数据集上的整体评估结果

Figure 11. Overall evaluation results of different algorithms on UAV123 dataset

下载: 全尺寸图片幻灯片

下载: 全尺寸图片幻灯片

不同算法在UAV123数据集上各属性的评估结果

Evaluation results of different algorithms in terms of different attributes of UAV123 dataset

下载: 全尺寸图片幻灯片

图 13 无人机拍摄视频跟踪结果

Figure 13. Tracking results of UAV shooting video

下载: 全尺寸图片幻灯片

参考文献(38)

[1]	孟琭, 杨旭. 目标跟踪算法综述[J]. 自动化学报, 2019, 45(7): 1244-1260. doi: 10.16383/j.aas.c180277 MENG L, YANG X. A survey of object tracking algorithms[J]. Acta Automatica Sinica, 2019, 45(7): 1244-1260(in Chinese). doi: 10.16383/j.aas.c180277
[2]	BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional siamese networks for object tracking[C]//Proceedings of the 2016 European Conference on Computer Vision(ECCV). Berlin: Springer, 2016, 9914: 850-865.
[3]	HUANG C, LUCEY S, RAMANAN D. Learning policies for adaptive tracking with deep feature cascades[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2017: 105-114.
[4]	LI B, YAN J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980.
[5]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031
[6]	LI B, WU W, WANG Q, et al. SiamRPN++: Evolution of Siamese visual tracking with very deep networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4277-4286.
[7]	FU C, CAO Z, LI Y, et al. Siamese anchor proposal network for high-speed aerial tracking[EB/OL]. (2021-03-26)[2021-04-05]. https://arxiv.org/abs/2012.10706.
[8]	刘芳, 孙亚楠, 王洪娟, 等. 基于残差学习的自适应无人机目标跟踪算法[J]. 北京航空航天大学学报, 2020, 46(10): 1874-1882. doi: 10.13700/j.bh.1001-5965.2019.0551 LIU F, SUN Y N, WANG H J, et al. Adaptive UAV target tracking algorithm based on residual learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(10): 1874-1882(in Chinese). doi: 10.13700/j.bh.1001-5965.2019.0551
[9]	刘芳, 杨安喆, 吴志威. 基于自适应Siamese网络的无人机目标跟踪算法[J]. 航空学报, 2020, 41(1): 243-255. LIU F, YANG A Z, WU Z W. Adaptive Siamese network based UAV target tracking algorithm[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(1): 243-255(in Chinese).
[10]	HE A, LUO C, TIAN X, et al. A twofold Siamese network for real-time object tracking[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4834-4843.
[11]	HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011-2023. doi: 10.1109/TPAMI.2019.2913372
[12]	钟莎, 黄玉清. 基于孪生区域候选网络的无人机指定目标跟踪[J]. 计算机应用, 2021, 41(2): 523-529. doi: 10.11772/j.issn.1001-9081.2020060762 ZHONG S, HUANG Y Q. Tracking of specified target of unmanned aerial vehicle based on Siamese region proposal network[J]. Journal of Computer Applications, 2021, 41(2): 523-529(in Chinese). doi: 10.11772/j.issn.1001-9081.2020060762
[13]	HE K M, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
[14]	MA N, ZHANG X, ZHENG H T, et al. ShuffleNet V2: Practical guidelines for efficient CNN architecture design[C]//Proceedings of the 2018 European Conference on Computer Vision(ECCV). Berlin: Springer, 2018, 11218: 122-138.
[15]	ITTI L, KOCH C, NIEBUR E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254-1259. doi: 10.1109/34.730558
[16]	LAROCHELLE H, HINTON G E. Learning to combine foveal glimpses with a third-order Boltzmann machine[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems. New York: Curran Associates Inc, 2010, 1: 1243-1251.
[17]	ZHANG Q L, YANG Y B. SA-Net: Shuffle attention for deep convolutional neural networks[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway: IEEE Press, 2021: 2235-2239.
[18]	MAX J, KAREN S, ANDREW Z, et al. Spatial transformer networks[C]//Proceedings of the 2015 Advances in Neural Information Processing Systems(NIPS). New York: Curran Associates Inc, 2015, 28: 2017-2025.
[19]	WANG Q, WU B, ZHU P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 11531-11539.
[20]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//Proceedings of the 2018 European Conference on Computer Vision(ECCV). Berlin: Springer, 2018, 11211: 3-19.
[21]	FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 3141-3149.
[22]	NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 4293-4302.
[23]	KAREN S, ANDREW Z. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2021-04-05].https://arxiv.org/abs/1409.1556.
[24]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. doi: 10.1145/3065386
[25]	WANG Q, TENG Z, XING J, et al. Learning attentions: Residual attentional Siamese network for high performance online visual tracking[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4854-4863.
[26]	WU Y, HE K. Group normalization[C]//Proceedings of the 2018 European Conference on Computer Vision(ECCV). Berlin: Springer, 2018: 3-19.
[27]	RAMPRASAATH R S, MICHAEL C, ABHISHEK D, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision(ICCV). Piscataway: IEEE Press, 2017: 618-626.
[28]	ROMAN P. An in-depth analysis of visual tracking with Siamese neural networks[EB/OL]. (2018-08-02)[2021-04-05]. https://arxiv. org/abs/1707.00569.
[29]	FAN H, LING H. Siamese cascaded region proposal networks for real-time visual tracking[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7944-7953.
[30]	ESTEBAN R, JONATHON S, STEFANO M, et al. YouTube-BoundingBoxes: A large high-precision human-annotated data set for object detection in video[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 7464-7473.
[31]	OLGA R, JIA D, HAO S, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. doi: 10.1007/s11263-015-0816-y
[32]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]//Proceedings of the 2014 European Conference on Computer Vision(ECCV). Berlin: Springer, 2014: 740-755.
[33]	MATTHIAS M, NEIL S, BERNARD G. A benchmark and simulator for UAV tracking[C]//Proceedings of the 2016 European Conference on Computer Vision(ECCV). Berlin: Springer, 2016: 445-461.
[34]	MARTIN D, GUSTAV H, FAHAD S K. Accurate scale estimation for robust visual tracking[C]//Proceedings of the 2014 British Machine Vision Conference(BMVC). Berlin: Springer, 2014: 1-11.
[35]	HARE S, GOLODETZ S, SAFFARI A, et al. Struck: Structured output tracking with kernels[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2096-2109. doi: 10.1109/TPAMI.2015.2509974
[36]	ZHANG J, MA S, SCLAROFF S. MEEM: Robust tracking via multiple experts using entropy minimization[C]//Proceedings of the 2014 European Conference on Computer Vision(ECCV). Berlin: Springer, 2014: 188-203.
[37]	LI Y, ZHU J. A scale adaptive kernel correlation filter tracker with feature integration[C]//Proceedings of the 2014 European Conference on Computer Vision(ECCV). Berlin: Springer, 2015: 254-265.
[38]	MARTIN D, GUSTAV H, FAHAD S K, et al. Learning spatially regularized correlation filters for visual tracking[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2015: 4310-4318.