基于眼动、位姿及场景视频的人体运动方向预测方法

张卿; 王兴坚; 苗忆南; 王少萍; GAVRILOVAlexander I.

doi:10.13700/j.bh.1001-5965.2020.0316

基于眼动、位姿及场景视频的人体运动方向预测方法

doi: 10.13700/j.bh.1001-5965.2020.0316

1.
北京航空航天大学自动化科学与电气工程学院, 北京 100083
2.
北京航空航天大学大数据精准医疗高精尖创新中心, 北京 100083
3.
莫斯科国立鲍曼技术大学自动控制系统系, 莫斯科 105005

基金项目:

国防基础科研项目 JCKY2018601C107

国家自然科学基金 51675019

国家自然科学基金 51620105010

详细信息

通讯作者:
王兴坚, E-mail: wangxj@buaa.edu.cn

中图分类号: TP391.4
计量
- 文章访问数: 918
- HTML全文浏览量: 95
- PDF下载量: 87
- 被引次数: 0
出版历程
- 收稿日期: 2020-07-03
- 录用日期: 2020-12-11
- 网络出版日期: 2021-09-20

Human motion direction prediction method based on eye tracking, pose and scene video

1.
School of Automation Science and Electrical Engineering, Beihang University, Beijing 100083, China
2.
Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing 100083, China
3.
Department of Automatic Control Systems, Bauman Moscow State Technical University, Moscow 105005, Russia

Funds:

Defense Industrial Technology Development Program JCKY2018601C107

National Natural Science Foundation of China 51675019

National Natural Science Foundation of China 51620105010

More Information

Corresponding author: WANG Xingjian, E-mail: wangxj@buaa.edu.cn

摘要

摘要:
外骨骼机器人作为新提出的改善、提高人类生活能力的智能设备，同样需要高效智能的人机交互系统，而人机交互的第一步，则是精准预测人的行为意图。从外骨骼机器人的顶层控制角度出发，介绍了人体运动意图识别和外骨骼机器人智能交互能力的研究现状，并对人体运动方向识别进行了研究。提出了一种结合眼动信息、位姿信息及场景视频信息的多信息融合的人体运动意图识别网络架构，并进行了采集设备的穿戴实验。利用实验数据，对提出的网络方法进行了实验验证。结果表明：所提出的人体运动方向的识别系统，可以预测出人体运动过程中的运动方向。
- 眼睛追踪 /
- 预测方法 /
- 运动方向 /
- 人体行为 /
- 人机交互
Abstract:
Exoskeleton robots, as newly proposed smart devices to improve and enhance human life ability, require efficient and intelligent human-computer interaction systems, and the first step of human-computer interaction is to accurtely predict human behavior intention. From the perspective of top-level control of exoskeleton robots, the current states and progress of human motion intention recognition and the intelligent interaction capabilities of exoskeleton robots are reported. Then, the recognition of human motion direction is studied. A network framework of human motion intention recognition combining eye tracking information, position and posture information, and scene video information is proposed, and wearable experiments of acquisition devices are carried out. The predictive capability of the network has been proved by experiments. The results show that the proposed recognition system can predict the movement direction during human movement.
- eye tracking /
- prediction method /
- direction of movement /
- human behavior /
- human computer interaction

HTML全文

图 1 人体运动方向预测方法的整体结构

Figure 1. Overall structure of human motion direction prediction method

下载: 全尺寸图片幻灯片

图 2 惯性视觉传感器的穿戴图

Figure 2. Human body wear of inertial vision sensor

下载: 全尺寸图片幻灯片

图 3 Tobii Pro Glasses 2设备图

Figure 3. Tobii Pro Glasses 2 equipment

下载: 全尺寸图片幻灯片

图 4 眼动仪数据采集流程

Figure 4. Flowchart of data acquisition for eye tracker

下载: 全尺寸图片幻灯片

图 5 数据同步前后对比

Figure 5. Comparison before and after data synchronization

下载: 全尺寸图片幻灯片

图 6 循环神经网络模型结构

Figure 6. Recurrent neural network model structure

下载: 全尺寸图片幻灯片

图 7 不同运动状态时人眼注视点分布块状直方图

Figure 7. Tiled Histogram of the distribution of fixation points of human eyes under different motion states

下载: 全尺寸图片幻灯片

图 8 欧拉角三维数据曲线

Figure 8. Euler angle 3D data curves

下载: 全尺寸图片幻灯片

图 9 经过数据处理后的偏航角数据

Figure 9. Yaw angle data after data processing

下载: 全尺寸图片幻灯片

图 10 真实值和预测值的对比曲线

Figure 10. Comparison of real and predicted values

下载: 全尺寸图片幻灯片

表 1 ResNet50网络各部分的详细结构^[14]

Table 1. Detailed structure of each part of ResNet50 network^[14]

层名	输出大小	各卷积层结构
conv1	112×112	7×7, 64, 步长2
maxpool	56×56	3×3, 最大池化，步长2
conv2_x	56×56
conv3_x	28×28
conv4_x	14×14
conv5_x	7×7
avgpool	1×1	平均池化，1 000维全连接层，softmax层

下载: 导出CSV

表 2 用于构建训练集的数据类型

Table 2. Data type used to build training set

数据名称	数据大小	采样频率/Hz
位置坐标	3×1	5
姿态坐标	3×1	5
场景特征向量	2 048×1	25
眼动注视数据	2×1	25
相对转角(标签)	1×1	5

下载: 导出CSV

表 3 4种训练数据的结果

Table 3. Results of four training experiment

训练数据	MAE	MSE	R²
①	1.383	4.863	0.676
②	1.337	4.413	0.705
③	1.415	4.681	0.688
④	1.484	5.873	0.608

下载: 导出CSV

参考文献(18)

[1]	JACOBSEN S C, OLIVIER M, SMITH F M, et al. Research robots for applications in artificial intelligence, teleoperation and entertainment[J]. The International Journal of Robotics Research, 2004, 23(4-5): 319-330. doi: 10.1177/0278364904042198
[2]	LIANG J W, JIANG L, NIEBLES J C, et al. Peeking into the future: Predicting future person activities and locations in videos[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway: IEEE Press, 2019: 2960-2963.
[3]	王昕. 面向下肢康复机器人的运动意图识别技术研究[D]. 杭州: 浙江大学, 2019: 97-98. WANG X. Research on motion intention recognition technology for lower limb rehabilitation robot[D]. Hangzhou: Zhejiang University, 2019: 97-98(in Chinese).
[4]	ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: Human trajectory prediction in crowded spaces[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2016: 961-971.
[5]	BHATTACHARYYA A, FRITZ M, SCHIELE B. Long-term on-board prediction of people in traffic scenes under uncertainty[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workships (CVPRW). Piscataway: IEEE Press, 2018: 4194-4202.
[6]	CHANDRA R, GUAN T R, PANUGANTI S, et al. Forecasting trajectory and behavior of road-agents using spectral clustering in graph-LSTMs[J]. IEEE Robotics and Automation Letters, 2020, 5(3): 4882-4890. doi: 10.1109/LRA.2020.3004794
[7]	CHANDRA R, BHATTACHARYA U, BERA A, et al. TraPHic: Trajectory prediction in dense and heterogeneous traffic using weighted interactions[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2019: 8475-8484.
[8]	XU Y Y, PIAO Z X, GAO S H. Encoding crowd interaction with deep neural network for pedestrian trajectory prediction[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2018: 5275-5284.
[9]	HAYASHI T, KAWAMOTO H, SANKAI Y. Control method of robot suit HAL working as operator's muscle using biological and dynamical information[C]//2005 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE Press, 2005: 3063-3068.
[10]	张富平. 图像去噪增强算法的研究[D]. 重庆: 重庆大学, 2014: 1-2. ZHANG F P. Research of image enhancement and image denoising[D]. Chongqing: Chongqing University, 2014: 1-2(in Chinese).
[11]	苏泫. 基于IMU预积分的视觉惯性里程计系统[D]. 广州: 华南理工大学, 2018: 5-7. SU X. Visual-inertial odometry based on IMU pre-integration[D]. Guangzhou: South China University of Technology, 2018: 5-7(in Chinese).
[12]	QIN T, LI P L, SHEN S J. VINS-Mono: A robust and versatile monocular visual-inertial state estimator[J]. IEEE Transactions on Robotics, 2018, 34(4): 1004-1020. doi: 10.1109/TRO.2018.2853729
[13]	张裕天. 基于视觉感知的多模态多任务端到端自动驾驶方法研究[D]. 广州: 华南理工大学, 2019: 43-45. ZHANG Y T. Research on multi-modal multi-task end-to-end autonomous driving method based on visual perception[D]. Guangzhou: South China University of Technology, 2019: 43-45(in Chinese).
[14]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2016: 770-778.
[15]	FLETCHER D, IAN P. 3D数学基础: 图形与游戏开发[M]. 史银雪, 陈洪, 王荣静, 译. 北京: 清华大学出版社, 2005: 154-157.
[16]	FLETCHER D, IAN P. 3D math primer for graphics and game development[M]. SHI Y X, CHEN H, WANG R J, translated. Beijing: Tsinghua University Press, 2005: 154-157(in Chinese).
[17]	FU R, ZHANG Z, LI L. Using LSTM and GRU neural network methods for traffic flow prediction[C]//2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC). Piscataway: IEEE Press, 2016: 324-328.
[22]	KINGMA D P, BA J. Adam: A method for stochastic optimization[EB/OL]. (2015-07-23)[2020-06-23].