留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于行人姿态的轨迹预测方法

王瑞平 宋晓 陈凯 龚开奇 张峻凡

王瑞平,宋晓,陈凯,等. 基于行人姿态的轨迹预测方法[J]. 北京航空航天大学学报,2023,49(7):1743-1754 doi: 10.13700/j.bh.1001-5965.2021.0557
引用本文: 王瑞平,宋晓,陈凯,等. 基于行人姿态的轨迹预测方法[J]. 北京航空航天大学学报,2023,49(7):1743-1754 doi: 10.13700/j.bh.1001-5965.2021.0557
WANG R P,SONG X,CHEN K,et al. Pedestrian trajectory prediction method based on pedestrian pose[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(7):1743-1754 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0557
Citation: WANG R P,SONG X,CHEN K,et al. Pedestrian trajectory prediction method based on pedestrian pose[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(7):1743-1754 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0557

基于行人姿态的轨迹预测方法

doi: 10.13700/j.bh.1001-5965.2021.0557
基金项目: 国家重点研发计划(2018YFB1702703)
详细信息
    通讯作者:

    E-mail:songxiao@buaa.edu.cn

  • 中图分类号: V221+.3;TB553

Pedestrian trajectory prediction method based on pedestrian pose

Funds: National Key R & D Program of China (2018YFB1702703)
More Information
  • 摘要:

    在自动驾驶领域,行人轨迹预测一直是研究热点之一,行人行为的不确定性给轨迹预测带来很大的挑战。目前大部分轨迹预测方法只专注于行人之间的信息交互,忽略了行人意图和场景中其他语义信息对行人轨迹的影响。为此,提出一种基于行人姿态的卷积编码器-解码器网络(PKCEDN)来预测目标行人轨迹的方法,所提方法包含基于卷积、长短时记忆(LSTM)网络的编码器-解码器模型和能够学习当前时刻与过去时刻轨迹相关性的注意力机制。所提方法在MOT16、MOT17和MOT20公开数据集上进行了相关测试,与Linear、LSTM、Social-LSTM、Social-生成对抗网络(GAN)、SR-LSTM和Msgtv等主流方法相比,在保证预测速度不降低的前提下,平均误差降低约36%。

     

  • 图 1  姿态表征行人意图

    Figure 1.  Pose representation of pedestrian intention

    图 2  PKCEDN的结构

    Figure 2.  Structure of PKCEDN

    图 3  Conv-LSTM网络结构

    Figure 3.  Structure of Conv-LSTM network

    图 4  空间语义信息分类

    Figure 4.  Spatial semantic information classification

    图 5  行人姿态建模与行人意图

    Figure 5.  Pedestrian intention modeling with pedestrian pose

    图 6  编码器-解码器中的注意力模块详细结构

    Figure 6.  Detailed structure attention module for encoder-decoder architecture

    图 7  MOT16数据集上,预测轨迹定性结果的可视化

    Figure 7.  Trajectories predictions qualitative results visualization on MOT16 datasets

    图 8  MOT20数据集上,预测轨迹定性结果的可视化

    Figure 8.  Trajectories predictions qualitative results visualization on MOT20 datasets

    表  1  网络层模型参数列表

    Table  1.   Network layer model parameters list

    网络层输入输出
    Layer116@11×1132@11×11
    Layer232@11×1164@11×11
    Layer364@11×11128@11×11
    Deconv2@1×1128@11×11
    Conv-LSTM (Decoder Module)128@11×11128@11×11
    Conv128@11×112@1×1
    下载: 导出CSV

    表  2  本文方法与主流方法的轨迹预测结果在MOT16数据集上的比较

    Table  2.   Comparison of trajectory prediction results of proposed method and state-of-the-art methods on MOT16 dataset 像素

    方法 ADE ADE均值
    MOT16-02 MOT16-04 MOT16-09 MOT16-10 MOT16-13
    LSTM 30.12 32.15 30.52 30.96 31.73 31.096
    Social-LSTM 29.13 30.86 31.25 29.85 29.95 30.208
    Social-GAN 27.63 29.42 28.75 29.54 30.83 29.234
    SR-LSTM 28.73 29.65 30.74 29.31 29.03 29.492
    Msgtv 27.32 29.21 27.49 30.38 29.72 28.824
    本文方法 19.15 17.46 19.54 16.75 18.43 18.266
    方法 FDE FDE均值
    MOT16-02 MOT16-04 MOT16-09 MOT16-10 MOT16-13
    LSTM 44.64 43.85 41.64 42.43 40.45 42.602
    Social-LSTM 43.53 41.85 40.64 39.56 40.75 41.266
    Social-GAN 34.54 35.46 32.64 37.85 38.85 35.868
    SR-LSTM 38.35 37.31 38.62 34.18 32.48 36.188
    Msgtv 33.07 34.38 31.83 38.01 35.74 34.606
    本文方法21.11 20.43 23.66 20.78 22.54 21.704
    下载: 导出CSV

    表  3  本文方法与主流方法的轨迹预测结果在MOT17数据集上的比较

    Table  3.   Comparison of trajectory prediction results of PKCEDN and state-of-the-art methods on MOT17 dataset 像素

    方法 ADE ADE均值
    MOT17-02 MOT17-03 MOT17-08 MOT17-11 MOT17-14
    LSTM 35.15 36.31 33.85 35.21 39.38 35.98
    Social-LSTM 30.01 31.48 32.89 35.02 33.64 32.608
    Social-GAN 28.79 30.13 28.62 30.93 30.48 29.79
    SR-LSTM 29.13 30.53 29.24 30.15 29.61 29.732
    Msgtv 26.53 30.13 28.75 29.32 27.54 28.454
    本文方法 20.04 17.86 18.42 15.84 18.05 18.042
    方法 FDE FDE均值
    MOT17-02 MOT17-03 MOT17-08 MOT17-11 MOT17-14
    LSTM 50.64 55.97 61.57 60.02 63.42 58.324
    Social-LSTM 44.64 52.18 54.15 43.63 42.17 47.354
    Social-GAN 35.64 39.24 38.02 37.89 35.93 37.344
    SR-LSTM 36.73 40.13 37.85 36.47 34.64 37.164
    Msgtv 34.53 37.64 35.18 37.56 33.17 35.616
    本文方法24.75 23.84 25.03 22.41 25.75 24.356
    下载: 导出CSV

    表  4  本文方法与主流方法的轨迹预测结果在MOT20数据集上的比较

    Table  4.   Comparison of trajectory prediction results of PKCEDN and state-of-the-art methods on MOT20 dataset 像素

    方法 ADE ADE均值
    MOT20-01 MOT20-02 MOT20-03 MOT20-05
    LSTM 32.13 34.98 29.76 30.65 31.88
    Social-LSTM 30.10 33.21 30.41 29.98 30.93
    Social-GAN 27.02 26.98 26.81 24.06 26.22
    SR-LSTM 28.31 33.01 28.73 29.53 29.895
    Msgtv 28.64 27.16 26.14 24.98 26.730
    本文方法 18.64 17.85 20.13 14.43 17.763
    方法FDE FDE均值
    MOT20-01 MOT20-02 MOT20-03 MOT20-05
    LSTM 46.54 51.83 53.54 53.41 51.33
    Social-LSTM 43.83 44.12 43.84 40.75 43.14
    Social-GAN 32.79 30.81 33.56 31.84 32.25
    SR-LSTM 36.64 40.37 39.21 34.78 37.750
    Msgtv 33.69 32.42 34.15 31.19 32.863
    本文方法24.74 21.63 24.31 19..46 22.535
    下载: 导出CSV

    表  5  本文方法的消融实验

    Table  5.   Ablation study of proposed method

    算法参数数值(MOT16)/像素
    TenlMADE 24.455
    MFDE31.324
    Tenl + TenpMADE22.543
    MFDE30.041
    TenhMADE21.412
    MFDE27.544
    Tenp+TenhMADE18.266
    MFDE21.704
    下载: 导出CSV

    表  6  本文方法与主流方法在预处理和预测速度方面的比较

    Table  6.   Comparison with state-of-the-art methods on pre-processing speed and prediction speed

    方法 数据预处理模块 预测模块速度/(次·s−1) 速度提升/%
    LSTM 0.03 0.04 76.29
    Social-LSTM 2.13 3.21 1
    Social-GAN 0.20 0.26 11.60
    SR-LSTM 1.24 2.01 1.64
    Msgtv 0.24 0.38 8.61
    本文方法 0.54 0.28 6.51
    下载: 导出CSV
  • [1] RODDENBERRY T M, GLAZE N, SEGARRA S. Principled simplicial neural networks for trajectory prediction[C]//Proceedings of Machine Learning Research. New York: Cornell Universitl, 2021: 9020-9029.
    [2] CHENG H, LIAO W T, YANG M Y, et al. AMENet: Attentive maps encoder network for trajectory prediction[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 172: 253-266. doi: 10.1016/j.isprsjprs.2020.12.004
    [3] CHANDRA R, GUAN T R, PANUGANTI S, et al. Forecasting trajectory and behavior of road-agents using spectral clustering in graph-LSTMs[J]. IEEE Robotics and Automation Letters, 2020, 5(3): 4882-4890. doi: 10.1109/LRA.2020.3004794
    [4] HADDAD S, LAM S K. Self-growing spatial graph networks for pedestrian trajectory prediction[C]//2020 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2020: 1140-1148.
    [5] WANG R, CUI Y, SONG X, et al. Multi-information-based convolutional neural network with attention mechanism for pedestrian trajectory prediction[J]. Image and Vision Computing, 2021, 107: 104110. doi: 10.1016/j.imavis.2021.104110
    [6] LIANG J W, JIANG L, NIEBLES J C, et al. Peeking into the future: Predicting future person activities and locations in videos[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway: IEEE Press, 2020: 5718-5727.
    [7] ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: Human trajectory prediction in crowded spaces[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway: IEEE Press, 2016: 961-971.
    [8] FERNANDO T, DENMAN S, SRIDHARAN S, et al. GD-GAN: Generative adversarial networks for trajectory prediction and group detection in crowds[M]. Computer Vision — ACCV 2018. Berlin: Springer, 2019: 314-330.
    [9] MANH H, ALAGHBAND G. ScenSe-LSTM: A model for human trajectory prediction[EB/OL]. (2019-04-15)[2021-08-20]. https://arxiv.orglabs/1808.04018.
    [10] SONG X, CHEN K, LI X, et al. Pedestrian trajectory prediction based on deep convolutional LSTM network[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(6): 3285-3302. doi: 10.1109/TITS.2020.2981118
    [11] GRAVES A, GRAVES A. Long short-term memory[M]. Supervised Sequence Labelling with Recurrent Neural Networks. Berlin: Springer, 2012: 37-45.
    [12] HELBING D, FARKAS I, VICSEK T. Simulating dynamical features of escape panic[J]. Nature, 2000, 407(6803): 487-490. doi: 10.1038/35035023
    [13] MORRIS B T, TRIVEDI M M. Trajectory learning for activity understanding: Unsupervised, multilevel, and long-term adaptive approach[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(11): 2287-2301. doi: 10.1109/TPAMI.2011.64
    [14] HELBING D, MOLNÁR P. Social force model for pedestrian dynamics[J]. Physical Review E, 1995, 51(5): 4282-4286. doi: 10.1103/PhysRevE.51.4282
    [15] KIM J, AHN C, LEE S. Modeling handicapped pedestrians considering physical characteristics using cellular automaton[J]. Physica A:Statistical Mechanics and Its Applications, 2018, 510: 507-517. doi: 10.1016/j.physa.2018.06.090
    [16] QUAN R J, ZHU L C, WU Y, et al. Holistic LSTM for pedestrian trajectory prediction[J]. IEEE Transactions on Image Processing, 2021, 30: 3229-3239. doi: 10.1109/TIP.2021.3058599
    [17] ZHAO T Y, XU Y F, MONFORT M, et al. Multi-agent tensor fusion for contextual trajectory prediction[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE press, 2019: 12126-12134.
    [18] XU Y Y, PIAO Z X, GAO S H. Encoding crowd interaction with deep neural network for pedestrian trajectory prediction[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 5275-5284.
    [19] ZOU H S, SU H, SONG S H, et al. Understanding human behaviors in crowds by imitating the decision-making process[C]//Thirty-Second AAAI Conference on Artificial Intelligence. Piscataway: AAAI Press, 2018.
    [20] TABUADA P, PAPPAS G J, LIMA P. Motion feasibility of multi-agent formations[J]. IEEE Transactions on Robotics, 2005, 21(3): 387-392.
    [21] ZHANG C J, LESSER V. Multi-agent learning with policy prediction[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York: ACM, 2010, 24(1): 927-934.
    [22] RUDENKO A, PALMIERI L, HERMAN M, et al. Human motion trajectory prediction: A survey[J]. The International Journal of Robotics Research, 2020, 39(8): 895-935. doi: 10.1177/0278364920917446
    [23] GUPTA A, JOHNSON J, LI F F, et al. Social GAN: Socially acceptable trajectories with generative adversarial networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2255-2264.
    [24] VEMULA A, MUELLING K, OH J. Social attention: Modeling attention in human crowds[C]//2018 IEEE international Conference on Robotics and Automation. Piscataway: IEEE Press, 2018: 4601-4607.
    [25] KITANI K M, ZIEBART B D, BAGNELL J A, et al. Activity Forecasting[C]//European Conference on Computer Vision. Berlin: Springer, 2012: 201-214.
    [26] XIE D, SHU T M, TODOROVIC S, et al. Learning and inferring “dark matter” and predicting human intents and trajectories in videos[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(7): 1639-1652. doi: 10.1109/TPAMI.2017.2728788
    [27] JAIPURIA N, HABIBI G, How J P. A transferable pedestrian motion prediction model for intersections with different geometries[EB/OL]. (2018-06-25)[2021-08-06]. https://arxiv.orglabs/1806.09444.
    [28] SADEGHIAN A, KOSARAJU V, SADEGHIAN A, et al. SoPhie: An attentive GAN for predicting paths compliant to social and physical constraints[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway: IEEE Press, 2019: 1349-1358.
    [29] NIKHIL N, MORRIS B T. Convolutional Neural Network for Trajectory Prediction[C]//European Conference on Computer Vision. Berlin: Springer, 2019: 186-196.
    [30] SIMON T, JOO H, MATTHEWS I, et al. Hand keypoint detection in single images using multiview bootstrapping[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 4645-4653.
    [31] KOOIJ J F P, SCHNEIDER N, FLOHR F, et al. Context-Based Pedestrian Path Prediction[C]//European Conference on Computer Vision. Berlin: Springer, 2014: 618-633.
    [32] YAGI T, MANGALAM K, YONETANI R, et al. Future person localization in first-person videos[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscatawary: IEEE press, 2018: 7593-7602.
    [33] RASOULI A, KOTSERUBA I, KUNIC T, et al. Pie: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction[C]//2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 6262-6271.
    [34] YANG S, LUO P, LOY C C, et al. Wider face: A face detection benchmark[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscatawary: IEEE Press, 2016: 5525-5533.
    [35] CAO Z, HIDALGO G, SIMON T, et al. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 172-186. doi: 10.1109/TPAMI.2019.2929257
    [36] PyTorch. [EB/OL]. (2018-06-23) [2021-07-12]. https://pytorch.org/.
    [37] MILAN A, LEAL-TAIXÉ L, REID I, et al. MOT16: A benchmark for multi-object tracking[EB/OL]. (2016-03-03)[2021-08-06]. https://arxiv.orglabs11603.00831.
    [38] DENDORFER P, REZATOFIGHI H, MILAN A, et al. Mot20: A benchmark for multi object tracking in crowded scenes[EB/OL]. (2020-03-19)[2021-08-07]. https://arxiv.orglabs/2003.0903.
    [39] ZHANG P, OUYANG W L, ZHANG P F, et al. SR-LSTM: State refinement for LSTM towards pedestrian trajectory prediction[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 12077-12086.
  • 加载中
图(8) / 表(6)
计量
  • 文章访问数:  475
  • HTML全文浏览量:  73
  • PDF下载量:  65
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-09-26
  • 录用日期:  2022-01-02
  • 网络出版日期:  2022-02-24
  • 整期出版日期:  2023-07-31

目录

    /

    返回文章
    返回
    常见问答