Target trajectory association method based on orientation constraint and re-identification feature
-
摘要:
基于检测关联和深度学习的目标轨迹关联方法是计算机视觉领域的研究热点之一,但现有方法设计中缺乏有效的时空约束,且目标表观特征泛化能力不足,在目标朝向差异明显的情况下会发生识别错误,在目标轨迹关联时会导致频繁的ID切换和错误关联。针对该问题,提出了一种基于朝向约束和重识别特征的目标轨迹关联方法。将行人朝向判别引入行人重识别中,提出了一种具有朝向约束力的行人重识别网络模型,提升了目标特征的表示能力。结合目标朝向、卡尔曼滤波得到的位置信息、重叠面积等时空特征,提出一种基于朝向约束的分层轨迹关联模型,得到单相机内的目标轨迹。在跨相机场景中,通过引入一种简单有效的双向竞争匹配机制,实现了目标轨迹的有效关联。实验结果表明: 所提方法在MOT数据集上度量指标优于多种方法,能够减少频繁的ID交换,有效解决了相似目标相向而行时的错误关联;帧率达到19.6帧/s,能够满足近实时场景下的使用要求。
Abstract:Target trajectory association method based on detection association and deep learning is one of the research hotspots in the field of computer vision. However, due to the lack of effective space-time constraints in the design of existing methods, and the insufficency of generalization ability of target apparent features, recognition errors will occur in the case of obvious differences in target orientation, which will lead to frequent ID switching and error association. To solve this problem, we propose a target trajectory association method based on orientation constraint and re-identification feature. This paper introduces pedestrian orientation discrimination into pedestrian re-identification, and presents a pedestrian re-identification network model with orientation constraint, which improves the representation ability of target features. Combining the spatial and temporal characteristics of target orientation, position information from Kalman filter and overlap area, a hierarchical trajectory association model based on orientation constraint is proposed to obtain the target trajectory in a single camera. A simple and effective bi-directional competitive matching mechanism is introduced to implement effective association of target trajectories in the cross camera scene. Experimental results show that the proposed method achieves a competitive level on MOT datasets. It can reduce frequent ID exchange, and can effectively solve the problem of error association when similar objects are moving towards each other. Meanwhile, with a frame rate of 19.6 frame/s, it can satisfy the requirements of near real-time scene.
-
表 1 行人重识别实验结果对比
Table 1. Comparison of experimental results of pedestrian re-identification
% 表 2 轨迹关联方法在MOT16数据集上的性能比较
Table 2. Performance comparison of trajectory association methods on MOT16 dateset
% -
[1] GHEISSARI N, SEBASTIAN T B, HARTLEY R. Person reidentification using spatiotemporal appearance[C]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2006: 1528-1535. [2] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031 [3] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 779-788. [4] RISTANI E, SOLERA F, ZOU R, et al. Performance measures and a data set for multi-target, multi-camera tracking[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 17-35. [5] LEAL-TAIXE L, CANTON-FERRER C, SCHINDLER K. Learning by tracking: Siamese CNN for robust target association[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE Press, 2016: 418-425. [6] WOJKE N, BEWLEY A, PAULUS D. Simple online and realtime tracking with a deep association metric[C]//2017 IEEE International Conference on Image Processing. Piscataway: IEEE Press, 2017: 3645-3649. [7] BEWLEY A, GE Z Y, OTT L, et al. Simple online and realtime tracking[C]//2016 IEEE International Conference on Image Processing. Piscataway: IEEE Press, 2016: 3464-3468. [8] CHEN L, AI H Z, ZHUANG Z J, et al. Real-time multiple people tracking with deeply learned candidate selection and person re-identification[C]//2018 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE Press, 2018: 1-6. [9] RISTANI E, TOMASI C. Features for multi-target multi-camera tracking and re-identification[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6036-6046. [10] WANG Z D, ZHENG L, LIU Y X, et al. Towards real-time multi-object tracking[C]//European Conference on Computer Vision. Berlin: Springer, 2020: 107-122. [11] BERGMANN P, MEINHARDT T, LEAL-TAIXÉ L. Tracking without bells and whistles[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2019: 941-951. [12] ZHAN Y, WANG C, WANG X, et al. A simple baseline for multi-object tracking[C]//2020 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020. [13] HOU Y, ZHENG L, WANG Z, et al. Locally aware apprearance metric for multi-target multi-camera tracking[C]//2020 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020. [14] SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: A unified embedding for face recognition and clustering[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 815-823. [15] HERMANS A, BEYER L, LEIBE B. In defense of the triplet loss for person re-identification[EB/OL]. (2017-03-22)[2021-02-01]. https://arxiv.org/abs/1703.07737. [16] LI D W, ZHANG Z, CHEN X T, et al. A richly annotated dataset for pedestrian attribute recognition[EB/OL]. (2016-03-23)[2021-02-01]. https://arxiv.org/abs/1603.07054. [17] ZHENG L, SHEN L Y, TIAN L, et al. Scalable person re-identification: A benchmark[C]//2015 IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1116-1124. [18] SUN Y F, ZHENG L, YANG Y, et al. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 480-496. [19] ZHANG X, LUO H, FAN X, et al. AlignedReID: Surpassing human-level performance in person re-identification[EB/OL]. (2017-11-22)[2021-02-01]. https://arxiv.org/abs/1711.08184. [20] ZHENG L, HUANG Y J, LU H C, et al. Pose-invariant embedding for deep person re-identification[J]. IEEE Transactions on Image Processing, 2019, 28(9): 4500-4509. [21] WEI L H, ZHANG S L, YAO H T, et al. GLAD: Global-local-alignment descriptor for scalable person re-identification[J]. IEEE Transactions on Multimedia, 2019, 21(4): 986-999. doi: 10.1109/TMM.2018.2870522 [22] ZHAO H Y, TIAN M Q, SUN S Y, et al. Spindle Net: Person re-identification with human body region guided feature decomposition and fusion[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 907-915. [23] LI W, ZHU X T, GONG S G. Harmonious attention network for person re-identification[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2285-2294. [24] FU Y, WEI Y C, ZHOU Y Q, et al. Horizontal pyramid matching for person re-identification[EB/OL]. (2018-11-10)[2021-02-01]. https://arxiv.org/abs/1804.05275v3. [25] ZHAO L M, LI X, ZHUANG Y T, et al. Deeply-learned part-aligned representations for person re-identification[C]//2017 IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 3239-3248. [26] ZHENG L, YANG Y, HAUPTMANN A G. Person re-identification: Past, present and future[EB/OL]. (2016-10-10)[2021-02-01]. https://arxiv.org/abs/1610.02984. [27] MILAN A, LEAL-TAIXE L, REID I, et al. MOT16: A benchmark for multi-object tracking[EB/OL]. (2016-03-02)[2021-02-01]. https://arxiv.org/abs/1603.00831. [28] SANCHEZ-MATILLA R, POIESI F, CAVALLARO A. Online multi-target tracking with strong and weak detections[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 84-99.