基于行人姿态的轨迹预测方法

王瑞平; 宋晓; 陈凯; 龚开奇; 张峻凡

doi:10.13700/j.bh.1001-5965.2021.0557

基于行人姿态的轨迹预测方法

doi: 10.13700/j.bh.1001-5965.2021.0557

王瑞平¹,
宋晓^2, ,,
陈凯³,
龚开奇²,
张峻凡⁴

1.
北京航空航天大学自动化科学与电气工程学院，北京 100191
2.
北京航空航天大学网络空间安全学院，北京 100191
3.
南京航空航天大学机电学院，南京 211106
4.
北京航空航天大学宇航学院，北京 100191

基金项目: 国家重点研发计划(2018YFB1702703)

详细信息

通讯作者:
E-mail：songxiao@buaa.edu.cn

中图分类号: V221⁺.3；TB553
计量
- 文章访问数: 484
- HTML全文浏览量: 74
- PDF下载量: 67
- 被引次数: 0
出版历程
- 收稿日期: 2021-09-26
- 录用日期: 2022-01-02
- 网络出版日期: 2022-02-24
- 整期出版日期: 2023-07-31

Pedestrian trajectory prediction method based on pedestrian pose

1.
School of Automation Science and Electrical Engineering，Beihang University，Beijing 100191， China
2.
School of Cyber Science and Technology，Beihang University，Beijing 100191，China
3.
College of Mechanical and Electrical Engineering，Nanjing University of Aeronautics and Astronautics，Nanjing 211106，China
4.
School of Astronautics，Beihang University，Beijing 100191，China

Funds: National Key R & D Program of China (2018YFB1702703)

More Information

Corresponding author: E-mail：songxiao@buaa.edu.cn

摘要

摘要:
在自动驾驶领域，行人轨迹预测一直是研究热点之一，行人行为的不确定性给轨迹预测带来很大的挑战。目前大部分轨迹预测方法只专注于行人之间的信息交互，忽略了行人意图和场景中其他语义信息对行人轨迹的影响。为此，提出一种基于行人姿态的卷积编码器-解码器网络(PKCEDN)来预测目标行人轨迹的方法，所提方法包含基于卷积、长短时记忆（LSTM）网络的编码器-解码器模型和能够学习当前时刻与过去时刻轨迹相关性的注意力机制。所提方法在MOT16、MOT17和MOT20公开数据集上进行了相关测试，与Linear、LSTM、Social-LSTM、Social-生成对抗网络（GAN）、SR-LSTM和M_sgtv等主流方法相比，在保证预测速度不降低的前提下，平均误差降低约36%。
- 行人轨迹预测 /
- 姿态提取 /
- 编码器-解码器 /
- 注意力机制 /
- 空间语义信息
Abstract:
In the field of autonomous driving, pedestrian trajectory prediction has been one of the research hotspots, and the uncertainty of pedestrian behavior poses a great challenge to trajectory prediction. Most of the current trajectory prediction methods only focus on the information interaction between pedestrians, ignoring the influence of pedestrian intention and other semantic information in the scene on the pedestrian trajectory. In order to achieve this, this paper suggests a method for predicting target pedestrian trajectory using pose keypoints based convolutional encoder-decoder network (PKCEDN). The method includes an attention mechanism that can learn the relationship between the current moment and past moment trajectories, as well as an encoder-decoder model based on convolutional, long and short-term memory (LSTM) networks. The proposed method has been tested on the MOT16, MOT17, and MOT20 public datasets, and the average error is reduced by about 36% compared to mainstream methods such as Linear, LSTM, Social-LSTM, Social-GAN, SR-LSTM, and M_sgtv, while ensuring no reduction in prediction speed.
- pedestrian trajectory prediction /
- pose extraction /
- encoder-decoder /
- attention mechanisms /
- spatial semantic information

HTML全文

图 1 姿态表征行人意图

Figure 1. Pose representation of pedestrian intention

下载: 全尺寸图片幻灯片

图 2 PKCEDN的结构

Figure 2. Structure of PKCEDN

下载: 全尺寸图片幻灯片

图 3 Conv-LSTM网络结构

Figure 3. Structure of Conv-LSTM network

下载: 全尺寸图片幻灯片

图 4 空间语义信息分类

Figure 4. Spatial semantic information classification

下载: 全尺寸图片幻灯片

图 5 行人姿态建模与行人意图

Figure 5. Pedestrian intention modeling with pedestrian pose

下载: 全尺寸图片幻灯片

图 6 编码器-解码器中的注意力模块详细结构

Figure 6. Detailed structure attention module for encoder-decoder architecture

下载: 全尺寸图片幻灯片

图 7 MOT16数据集上，预测轨迹定性结果的可视化

Figure 7. Trajectories predictions qualitative results visualization on MOT16 datasets

下载: 全尺寸图片幻灯片

图 8 MOT20数据集上，预测轨迹定性结果的可视化

Figure 8. Trajectories predictions qualitative results visualization on MOT20 datasets

下载: 全尺寸图片幻灯片

表 1 网络层模型参数列表

Table 1. Network layer model parameters list

网络层	输入	输出
Layer1	16@11×11	32@11×11
Layer2	32@11×11	64@11×11
Layer3	64@11×11	128@11×11
Deconv	2@1×1	128@11×11
Conv-LSTM (Decoder Module)	128@11×11	128@11×11
Conv	128@11×11	2@1×1

下载: 导出CSV

表 2 本文方法与主流方法的轨迹预测结果在MOT16数据集上的比较

Table 2. Comparison of trajectory prediction results of proposed method and state-of-the-art methods on MOT16 dataset 像素

方法	ADE					ADE均值
方法	MOT16-02	MOT16-04	MOT16-09	MOT16-10	MOT16-13	ADE均值
LSTM	30.12	32.15	30.52	30.96	31.73	31.096
Social-LSTM	29.13	30.86	31.25	29.85	29.95	30.208
Social-GAN	27.63	29.42	28.75	29.54	30.83	29.234
SR-LSTM	28.73	29.65	30.74	29.31	29.03	29.492
M_sgtv	27.32	29.21	27.49	30.38	29.72	28.824
本文方法	19.15	17.46	19.54	16.75	18.43	18.266

方法	FDE					FDE均值
方法	MOT16-02	MOT16-04	MOT16-09	MOT16-10	MOT16-13	FDE均值
LSTM	44.64	43.85	41.64	42.43	40.45	42.602
Social-LSTM	43.53	41.85	40.64	39.56	40.75	41.266
Social-GAN	34.54	35.46	32.64	37.85	38.85	35.868
SR-LSTM	38.35	37.31	38.62	34.18	32.48	36.188
M_sgtv	33.07	34.38	31.83	38.01	35.74	34.606
本文方法	21.11	20.43	23.66	20.78	22.54	21.704

下载: 导出CSV

表 3 本文方法与主流方法的轨迹预测结果在MOT17数据集上的比较

Table 3. Comparison of trajectory prediction results of PKCEDN and state-of-the-art methods on MOT17 dataset 像素

方法	ADE					ADE均值
方法	MOT17-02	MOT17-03	MOT17-08	MOT17-11	MOT17-14	ADE均值
LSTM	35.15	36.31	33.85	35.21	39.38	35.98
Social-LSTM	30.01	31.48	32.89	35.02	33.64	32.608
Social-GAN	28.79	30.13	28.62	30.93	30.48	29.79
SR-LSTM	29.13	30.53	29.24	30.15	29.61	29.732
M_sgtv	26.53	30.13	28.75	29.32	27.54	28.454
本文方法	20.04	17.86	18.42	15.84	18.05	18.042

方法	FDE					FDE均值
方法	MOT17-02	MOT17-03	MOT17-08	MOT17-11	MOT17-14	FDE均值
LSTM	50.64	55.97	61.57	60.02	63.42	58.324
Social-LSTM	44.64	52.18	54.15	43.63	42.17	47.354
Social-GAN	35.64	39.24	38.02	37.89	35.93	37.344
SR-LSTM	36.73	40.13	37.85	36.47	34.64	37.164
M_sgtv	34.53	37.64	35.18	37.56	33.17	35.616
本文方法	24.75	23.84	25.03	22.41	25.75	24.356

下载: 导出CSV

表 4 本文方法与主流方法的轨迹预测结果在MOT20数据集上的比较

Table 4. Comparison of trajectory prediction results of PKCEDN and state-of-the-art methods on MOT20 dataset 像素

方法	ADE				ADE均值
方法	MOT20-01	MOT20-02	MOT20-03	MOT20-05	ADE均值
LSTM	32.13	34.98	29.76	30.65	31.88
Social-LSTM	30.10	33.21	30.41	29.98	30.93
Social-GAN	27.02	26.98	26.81	24.06	26.22
SR-LSTM	28.31	33.01	28.73	29.53	29.895
M_sgtv	28.64	27.16	26.14	24.98	26.730
本文方法	18.64	17.85	20.13	14.43	17.763

方法	FDE				FDE均值
方法	MOT20-01	MOT20-02	MOT20-03	MOT20-05	FDE均值
LSTM	46.54	51.83	53.54	53.41	51.33
Social-LSTM	43.83	44.12	43.84	40.75	43.14
Social-GAN	32.79	30.81	33.56	31.84	32.25
SR-LSTM	36.64	40.37	39.21	34.78	37.750
M_sgtv	33.69	32.42	34.15	31.19	32.863
本文方法	24.74	21.63	24.31	19..46	22.535

下载: 导出CSV

表 5 本文方法的消融实验

Table 5. Ablation study of proposed method

算法	参数	数值(MOT16)/像素
Ten_l	MADE	24.455
Ten_l	MFDE	31.324
Ten_l + Ten_p	MADE	22.543
Ten_l + Ten_p	MFDE	30.041
Ten_h	MADE	21.412
Ten_h	MFDE	27.544
Ten_p+Ten_h	MADE	18.266
Ten_p+Ten_h	MFDE	21.704

下载: 导出CSV

表 6 本文方法与主流方法在预处理和预测速度方面的比较

Table 6. Comparison with state-of-the-art methods on pre-processing speed and prediction speed

方法	数据预处理模块	预测模块速度/(次·s⁻¹)	速度提升/%
LSTM	0.03	0.04	76.29
Social-LSTM	2.13	3.21	1
Social-GAN	0.20	0.26	11.60
SR-LSTM	1.24	2.01	1.64
M_sgtv	0.24	0.38	8.61
本文方法	0.54	0.28	6.51

下载: 导出CSV

参考文献(39)

[1]	RODDENBERRY T M, GLAZE N, SEGARRA S. Principled simplicial neural networks for trajectory prediction[C]//Proceedings of Machine Learning Research. New York: Cornell Universitl, 2021: 9020-9029.
[2]	CHENG H, LIAO W T, YANG M Y, et al. AMENet: Attentive maps encoder network for trajectory prediction[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 172: 253-266. doi: 10.1016/j.isprsjprs.2020.12.004
[3]	CHANDRA R, GUAN T R, PANUGANTI S, et al. Forecasting trajectory and behavior of road-agents using spectral clustering in graph-LSTMs[J]. IEEE Robotics and Automation Letters, 2020, 5(3): 4882-4890. doi: 10.1109/LRA.2020.3004794
[4]	HADDAD S, LAM S K. Self-growing spatial graph networks for pedestrian trajectory prediction[C]//2020 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2020: 1140-1148.
[5]	WANG R, CUI Y, SONG X, et al. Multi-information-based convolutional neural network with attention mechanism for pedestrian trajectory prediction[J]. Image and Vision Computing, 2021, 107: 104110. doi: 10.1016/j.imavis.2021.104110
[6]	LIANG J W, JIANG L, NIEBLES J C, et al. Peeking into the future: Predicting future person activities and locations in videos[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway: IEEE Press, 2020: 5718-5727.
[7]	ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: Human trajectory prediction in crowded spaces[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway: IEEE Press, 2016: 961-971.
[8]	FERNANDO T, DENMAN S, SRIDHARAN S, et al. GD-GAN: Generative adversarial networks for trajectory prediction and group detection in crowds[M]. Computer Vision — ACCV 2018. Berlin: Springer, 2019: 314-330.
[9]	MANH H, ALAGHBAND G. ScenSe-LSTM: A model for human trajectory prediction[EB/OL]. (2019-04-15)[2021-08-20]. https://arxiv.orglabs/1808.04018.
[10]	SONG X, CHEN K, LI X, et al. Pedestrian trajectory prediction based on deep convolutional LSTM network[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(6): 3285-3302. doi: 10.1109/TITS.2020.2981118
[11]	GRAVES A, GRAVES A. Long short-term memory[M]. Supervised Sequence Labelling with Recurrent Neural Networks. Berlin: Springer, 2012: 37-45.
[12]	HELBING D, FARKAS I, VICSEK T. Simulating dynamical features of escape panic[J]. Nature, 2000, 407(6803): 487-490. doi: 10.1038/35035023
[13]	MORRIS B T, TRIVEDI M M. Trajectory learning for activity understanding: Unsupervised, multilevel, and long-term adaptive approach[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(11): 2287-2301. doi: 10.1109/TPAMI.2011.64
[14]	HELBING D, MOLNÁR P. Social force model for pedestrian dynamics[J]. Physical Review E, 1995, 51(5): 4282-4286. doi: 10.1103/PhysRevE.51.4282
[15]	KIM J, AHN C, LEE S. Modeling handicapped pedestrians considering physical characteristics using cellular automaton[J]. Physica A:Statistical Mechanics and Its Applications, 2018, 510: 507-517. doi: 10.1016/j.physa.2018.06.090
[16]	QUAN R J, ZHU L C, WU Y, et al. Holistic LSTM for pedestrian trajectory prediction[J]. IEEE Transactions on Image Processing, 2021, 30: 3229-3239. doi: 10.1109/TIP.2021.3058599
[17]	ZHAO T Y, XU Y F, MONFORT M, et al. Multi-agent tensor fusion for contextual trajectory prediction[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE press, 2019: 12126-12134.
[18]	XU Y Y, PIAO Z X, GAO S H. Encoding crowd interaction with deep neural network for pedestrian trajectory prediction[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 5275-5284.
[19]	ZOU H S, SU H, SONG S H, et al. Understanding human behaviors in crowds by imitating the decision-making process[C]//Thirty-Second AAAI Conference on Artificial Intelligence. Piscataway: AAAI Press, 2018.
[20]	TABUADA P, PAPPAS G J, LIMA P. Motion feasibility of multi-agent formations[J]. IEEE Transactions on Robotics, 2005, 21(3): 387-392.
[21]	ZHANG C J, LESSER V. Multi-agent learning with policy prediction[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York: ACM, 2010, 24(1): 927-934.
[22]	RUDENKO A, PALMIERI L, HERMAN M, et al. Human motion trajectory prediction: A survey[J]. The International Journal of Robotics Research, 2020, 39(8): 895-935. doi: 10.1177/0278364920917446
[23]	GUPTA A, JOHNSON J, LI F F, et al. Social GAN: Socially acceptable trajectories with generative adversarial networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2255-2264.
[24]	VEMULA A, MUELLING K, OH J. Social attention: Modeling attention in human crowds[C]//2018 IEEE international Conference on Robotics and Automation. Piscataway: IEEE Press, 2018: 4601-4607.
[25]	KITANI K M, ZIEBART B D, BAGNELL J A, et al. Activity Forecasting[C]//European Conference on Computer Vision. Berlin: Springer, 2012: 201-214.
[26]	XIE D, SHU T M, TODOROVIC S, et al. Learning and inferring “dark matter” and predicting human intents and trajectories in videos[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(7): 1639-1652. doi: 10.1109/TPAMI.2017.2728788
[27]	JAIPURIA N, HABIBI G, How J P. A transferable pedestrian motion prediction model for intersections with different geometries[EB/OL]. (2018-06-25)[2021-08-06]. https://arxiv.orglabs/1806.09444.
[28]	SADEGHIAN A, KOSARAJU V, SADEGHIAN A, et al. SoPhie: An attentive GAN for predicting paths compliant to social and physical constraints[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway: IEEE Press, 2019: 1349-1358.
[29]	NIKHIL N, MORRIS B T. Convolutional Neural Network for Trajectory Prediction[C]//European Conference on Computer Vision. Berlin: Springer, 2019: 186-196.
[30]	SIMON T, JOO H, MATTHEWS I, et al. Hand keypoint detection in single images using multiview bootstrapping[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 4645-4653.
[31]	KOOIJ J F P, SCHNEIDER N, FLOHR F, et al. Context-Based Pedestrian Path Prediction[C]//European Conference on Computer Vision. Berlin: Springer, 2014: 618-633.
[32]	YAGI T, MANGALAM K, YONETANI R, et al. Future person localization in first-person videos[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscatawary: IEEE press, 2018: 7593-7602.
[33]	RASOULI A, KOTSERUBA I, KUNIC T, et al. Pie: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction[C]//2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 6262-6271.
[34]	YANG S, LUO P, LOY C C, et al. Wider face: A face detection benchmark[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscatawary: IEEE Press, 2016: 5525-5533.
[35]	CAO Z, HIDALGO G, SIMON T, et al. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 172-186. doi: 10.1109/TPAMI.2019.2929257
[36]	PyTorch. [EB/OL]. (2018-06-23) [2021-07-12]. https://pytorch.org/.
[37]	MILAN A, LEAL-TAIXÉ L, REID I, et al. MOT16: A benchmark for multi-object tracking[EB/OL]. (2016-03-03)[2021-08-06]. https://arxiv.orglabs11603.00831.
[38]	DENDORFER P, REZATOFIGHI H, MILAN A, et al. Mot20: A benchmark for multi object tracking in crowded scenes[EB/OL]. (2020-03-19)[2021-08-07]. https://arxiv.orglabs/2003.0903.
[39]	ZHANG P, OUYANG W L, ZHANG P F, et al. SR-LSTM: State refinement for LSTM towards pedestrian trajectory prediction[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 12077-12086.