留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于图像语义分割的目标跟踪尺度自适应算法

陈凯 赵晓冬 黄煜杰 王鹏飞 雷一辰 张彪

陈凯,赵晓冬,黄煜杰,等. 基于图像语义分割的目标跟踪尺度自适应算法[J]. 北京航空航天大学学报,2026,52(5):1536-1546
引用本文: 陈凯,赵晓冬,黄煜杰,等. 基于图像语义分割的目标跟踪尺度自适应算法[J]. 北京航空航天大学学报,2026,52(5):1536-1546
CHEN K,ZHAO X D,HUANG Y J,et al. Adaptive algorithm for target tracking scale based on image semantic segmentation[J]. Journal of Beijing University of Aeronautics and Astronautics,2026,52(5):1536-1546 (in Chinese)
Citation: CHEN K,ZHAO X D,HUANG Y J,et al. Adaptive algorithm for target tracking scale based on image semantic segmentation[J]. Journal of Beijing University of Aeronautics and Astronautics,2026,52(5):1536-1546 (in Chinese)

基于图像语义分割的目标跟踪尺度自适应算法

doi: 10.13700/j.bh.1001-5965.2024.0197
基金项目: 

国家自然科学基金(52202417); 中国博士后科学基金(2022TQ0155,2022M721605);虚拟现实技术与系统全国重点实验室(北京航空航天大学)开放课题基金(VRLAB2023A02);中国科协青年科技人才托举工程(2023QNRC001);江苏省科协青年科技人才托举工程(JSTJ-2023-XH032)

详细信息
    通讯作者:

    E-mail:xdzhao@nuaa.edu.cn

  • 中图分类号: TP391.4

Adaptive algorithm for target tracking scale based on image semantic segmentation

Funds: 

National Natural Science Foundation of China (52202417); China Postdoctoral Science Foundation (2022TQ0155,2022M721605); Open Project Program of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (VRLAB2023A02); Young Elite Scientists Sponsorship Program by China Association for Science and Technology (2023QNRC001); Young Elite Scientists Sponsorship Program by Jiangsu Association for Science and Technology (JSTJ-2023-XH032)

More Information
  • 摘要:

    围绕如何在复杂的场景中充分利用图像语义分割得到目标的语义信息,并对目标跟踪器的输出进行尺度优化,设计了基于注意力机制优化的图像语义分割网络。针对目标跟踪器的输出和特征输入2个方面进行优化,该网络可以实现对各类算法的即插即用。利用图像语义分割掩码获得目标的旋转框边界,根据目标的旋转框和非旋转框边界对目标输入阶段的特征进行去噪优化,减弱背景噪声对跟踪器的判别影响。分别从所设计网络的结构、训练、目标旋转框标定及对跟踪器的输入特征进行去噪等方面进行讨论。在公开数据集OTB100、VOT2016和VOT2018上进行实验,对比验证了目标运动模型在解决目标跟踪过程中,目标尺度优化的准确率和鲁棒性。

     

  • 图 1  本文算法整体流程图

    Figure 1.  Overall flow chart of the proposed algorithm

    图 2  本文图像语义分割网络的结构

    Figure 2.  Structure of the image semantic segmentation network

    图 3  上采样优化模块的结构

    Figure 3.  Structure of the oversampling optimization module

    图 4  深度残差模块中加入特征修正模块的结构

    Figure 4.  Structure of adding feature correction module to depth residual module

    图 5  3种从二进制掩码生成边界框的算法对比

    Figure 5.  Comparison of three algorithms for generating bounding boxes from binary masks

    图 6  3种掩码对目标背景去噪的示意图

    Figure 6.  Schematic diagram of three masks for denoising target background in target images

    图 7  语义扩展和利用余弦窗对目标特征降噪后相关滤波跟踪算法ECO判别器的输出响应

    Figure 7.  Semantic expansion and output response of ECO discriminator of correlation filter tracking algorithm after noise reduction of target features using a cosine window

    图 8  OTB2015基准上的成功率和准确率

    Figure 8.  Success rate and accuracy on OTB2015 benchmarks

    图 9  ECO-S和SiamMask在OTB100数据集2个视频序列上的定性结果

    Figure 9.  Characterization results of ECO-S and SiamMask on two video sequences of OTB100

    图 10  VOT2016基准上的实验对比

    Figure 10.  Comparison of experiments on VOT2016 benchmark

    图 11  VOT2018基准上的实验对比

    Figure 11.  Comparison of experiments on VOT2016 benchmark

    图 12  ECO-S和SiamMask在VOT数据集2个视频序列上的定性结果

    Figure 12.  Qualitative results of ECO-S and SiamMask on two video sequences of VOT

    表  1  VOT-2016上不同包围矩形框标定策略的性能

    Table  1.   Performance of different enclosing rectangular box calibration strategies on VOT-2016

    算法 mIOU/% mAP@0.5/% mAP@0.7/%
    SiamFC[6] 50.48 56.42 9.28
    SiamRPN[8] 60.02 76.20 32.47
    SiamMask-Min-max 65.05 82.99 43.09
    SiamMask-MBR 67.15 85.42 50.86
    SiamMask-Opt 71.68 90.77 60.47
    ECO-Min-max 69.33 85.14 50.23
    ECO-MBR 72.57 88.31 56.74
    ECO-Opt 74.97 91.85 68.56
    下载: 导出CSV

    表  2  VOT2016和VOT2018视频序列基准上的性能比较

    Table  2.   Performance comparison on VOT2016 and VOT2018 video sequence benchmarks

    算法 准确率 鲁棒性 EAO
    VOT2016 VOT2018 VOT2016 VOT2018 VOT2016 VOT2018
    ECO-S 0.5295 0.5405 16.5817 7.2398 0.3293 0.4083
    ECO[27] 0.4847 0.4978 15.0437 13.5112 0.3089 0.3077
    SiamMask[5] 0.5470 0.5337 17.9393 7.4410 0.3251 0.4016
    SiamRPN-S 0.5702 0.5515 19.2720 10.2551 0.3206 0.3727
    SiamRPN[8] 0.5386 0.5399 21.0817 14.2040 0.2579 0.3177
    DeepSRDCF-S 0.5072 0.5263 19.5438 17.4695 0.2773 0.2799
    DeepSRDCF[33] 0.5220 0.5016 20.3462 23.9644 0.2756 0.2282
    下载: 导出CSV

    表  3  OBT100、VOT-2016、VOT-2018上的计算开销比较

    Table  3.   Comparison of computational overhead on OBT100, VOT-2016, VOT-2018

    算法 参数量 浮点运算速度/109 s−1
    OBT-100 VOT-2016 VOT-2018
    ECO-S 5.92×106 0.39 0.46 0.51
    ECO[27] 5.34×106 0.40 0.46 0.50
    SiamMask[5] 12.96×106 0.86 0.97 1.07
    SiamRPN-S 11.13×106 0.81 0.89 0.91
    SiamRPN[8] 10.61×106 0.75 0.81 0.85
    DeepSRDCF-S 12.68×106 0.86 0.96 1.07
    DeepSRDCF[33] 12.02×106 0.85 0.96 1.06
    下载: 导出CSV
  • [1] DING W Z, XU Q J, LIU S Y, et al. SAMF: a self-adaptive protein modeling framework[J]. Bioinformatics, 2021, 37(22): 4075-4082.
    [2] TANG L N, XU X C, WANG X, et al. An automatic object attention likelihood map correlation filter for visual tracking[C]//Proceedings of the IEEE 11th Asia-Pacific Conference on Antennas and Propagation. Piscataway: IEEE Press, 2024: 1-2.
    [3] LI B L, WANG Y, XU Y M, et al. DSST: a dual student model guided student–teacher framework for semi-supervised medical image segmentation[J]. Biomedical Signal Processing and Control, 2024, 90: 105890.
    [4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 580-587.
    [5] HU W M, WANG Q, ZHANG L, et al. SiamMask: a framework for fast online object tracking and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3072-3089.
    [6] LEE S, KIM S. Comparative analysis of IR-IR image matching applying the deep learning-based template matching techniques[C]//Proceedings of the 23rd International Conference on Control, Automation and Systems. Piscataway: IEEE Press, 2023: 441-444.
    [7] HELD D, THRUN S, SAVARESE S. Learning to track at 100 FPS with deep regression networks[C]//Proceedings of the Computer Vision-ECCV. Berlin: Springer, 2016: 749-765.
    [8] WANG S Q, QIAN K, SHEN J L, et al. AD-SiamRPN: anti-deformation object tracking via an improved siamese region proposal network on hyperspectral videos[J]. Remote Sensing, 2023, 15(7): 1731.
    [9] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
    [10] HARIHARAN B, ARBELÁEZ P, GIRSHICK R, et al. Simultaneous detection and segmentation[C]//Proceedings of the Computer Vision-ECCV. Berlin: Springer, 2014: 297-312.
    [11] PINHEIRO P O, COLLOBERT R, DOLLÁR P. Learning to segment object candidates[C]//Proceedings of the 29th International Conference on Neural Information Processing Systems. New York: ACM, 2015(2): 1990-1998.
    [12] DAI J F, HE K M, SUN J. Instance-aware semantic segmentation via multi-task network cascades[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 3150-3158.
    [13] HARIHARAN B, ARBELÁEZ P, GIRSHICK R, et al. Hypercolumns for object segmentation and fine-grained localization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 447-456.
    [14] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
    [15] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2024-02-01]. https://arxiv.org/abs/1409.1556.
    [16] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 1-9.
    [17] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
    [18] SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence. Piscataway: IEEE Press, 2016: 640-651.
    [19] XIE S N, TU Z W. Holistically-nested edge detection[J]. International Journal of Computer Vision, 2017, 125(1): 3-18.
    [20] SERMANET P, KAVUKCUOGLU K, CHINTALA S, et al. Pedestrian detection with unsupervised multi-stage feature learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2013: 3626-3633.
    [21] KOONCE B. ResNet 50[C]//Proceedings of the Convolutional Neural Networks with Swift for Tensorflow. Berkeley: Apress, 2021: 63-72.
    [22] HAN F, JIANG S K, WU J M, et al. Real-time object tracking in the wild with siamese network[J]. Multimedia Tools and Applications, 2023, 82(16): 24327-24343.
    [23] CAO D, DAI R H, WANG J, et al. Fast visual tracking with squeeze and excitation region proposal network[J]. Human-centric Computing and Information Sciences, 2023, 13(7): 20.
    [24] WANG Q, GAO J, XING J L, et al. Dcfnet: discriminant correlation filters network for visual tracking[EB/OL]. (2017-04-13)[2024-02-01]. https://arxiv.org/abs/1704.04057.
    [25] ZHANG H Y, LIU G X, ZHANG Y, et al. Robust multi-model visual tracking with distractor-aware template-coupled correlation filters joint learning[J]. IEEE Transactions on Multimedia, 2024, 26: 1813-1828.
    [26] FANG H, LIAO G S, LIU Y J, et al. Shadow-assisted moving target tracking based on multidiscriminant correlation filters network in video SAR[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 4006205.
    [27] DANELLJAN M, BHAT G, SHAHBAZ KHAN F, et al. ECO: efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6638-6646.
    [28] RAHMAN M M. Target focused shallow transformer framework for efficient visual tracking[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(21): 23409-23410.
    [29] FU J Y, LIANG Q F, XIE Q S, et al. Object tracking based on foreground adaptive bounding box and motion state redetection[C]//Proceedings of the 3rd International Conference on Artificial Intelligence and Computer Engineering. Bellingham: SPIE, 2023: 155.
    [30] 黄煜杰, 陈凯, 王子源, 等. 多目视觉下基于融合特征的密集行人跟踪方法[J]. 北京航空航天大学学报, 2025, 51(7): 2513-2525.

    HUANG Y J, CHEN K, WANG Z Y, et al. A dense pedestrian tracking method based on fusion features under multi-vision[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(7): 2513-2525(in Chinese).
    [31] CHEN K, SONG X, YUAN H T, et al. Fully convolutional encoder-decoder with an attention mechanism for practical pedestrian trajectory prediction[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(11): 20046-20060.
    [32] CHEN K, ZHU H H, TANG D B, et al. Future pedestrian location prediction in first-person videos for autonomous vehicles and social robots[J]. Image and Vision Computing, 2023, 134: 104671.
    [33] DANELLJAN M, HAGER G, SHAHBAZ KHAN F, et al. Convolutional features for correlation filter based visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE Press, 2015: 58-66.
  • 加载中
图(12) / 表(3)
计量
  • 文章访问数:  312
  • HTML全文浏览量:  161
  • PDF下载量:  13
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-04-07
  • 录用日期:  2024-05-24
  • 网络出版日期:  2024-06-08
  • 整期出版日期:  2026-05-26

目录

    /

    返回文章
    返回
    常见问答