基于多任务辅助推理的近眼视线估计方法

王小东; 谢良; 闫慧炯; 闫野; 印二威; 李卫国

doi:10.13700/j.bh.1001-5965.2020.0700

基于多任务辅助推理的近眼视线估计方法

doi: 10.13700/j.bh.1001-5965.2020.0700

王小东^{1, 2},
谢良^{2, 3, ,},
闫慧炯²,
闫野^{2, 3},
印二威^{2, 3},
李卫国¹

1.
北京航空航天大学软件学院, 北京 100083
2.
天津(滨海)人工智能创新中心, 天津 300450
3.
军事科学院国防科技创新研究院, 北京 100071

基金项目:

国家自然科学基金 61901505

详细信息

通讯作者:
谢良, E-mail: xielnudt@gmail.com

中图分类号: TP391
计量
- 文章访问数: 346
- HTML全文浏览量: 93
- PDF下载量: 41
- 被引次数: 0
出版历程
- 收稿日期: 2020-12-18
- 录用日期: 2021-04-09
- 网络出版日期: 2022-06-20
- 整期出版日期: 2022-06-20

Near-eye gaze estimation based on multitasking auxiliary reasoning

WANG Xiaodong^{1, 2},
XIE Liang^{2, 3
, ,},
YAN Huijiong²,
YAN Ye^{2, 3},
YIN Erwei^{2, 3},
LI Weiguo¹

1.
College of Software, Beihang University, Beijing 100083, China
2.
Tianjin (Binhai) Artificial Intelligence Innovation Center, Tianjin 300450, China
3.
Defense Innovation Institute, Academy of Military Sciences, Beijing 100071, China

Funds:

National Natural Science Foundation of China 61901505

More Information

Corresponding author: XIE Liang, E-mail: xielnudt@gmail.com

摘要

摘要:
眼动交互是头戴式虚拟现实(VR)/增强现实(AR)设备的关键操控方式, 如何进行高精度、高鲁棒性的非标定视线估计是当前VR/AR眼动交互的核心问题之一, 高效、鲁棒的非标定视线估计需要大量的眼图训练数据和高效的算法结构做支撑。在现有基于深度学习的近眼视线估计方法的基础上, 通过添加多任务辅助推理模块, 增加网络结构的多阶段输出, 进行多任务联合训练, 在不增加视线估计测试耗时的前提下, 有效提升视线估计精度。在模型训练时, 从视线估计网络结构的多个中间阶段引出多个眼部特征的辅助推理并行网络头, 包括眼动图像的语义分割、虹膜边界框及眼部轮廓信息, 为原始视线估计网络提供多阶段中继监控, 在不增加训练数据的基础上, 有效提升视线估计网络的测试精度。在国际公开数据集Acomo-14与OpenEDS2020上的验证实验表明, 与无辅助推理的网络相比, 所提方法精度分别得到了21.74%与18.91%的效果提升, 平均角度误差分别减少到1.38°与2.01°。
- 视线估计 /
- 增强现实(AR) /
- 人机交互 /
- 多任务学习 /
- 辅助推理
Abstract:
Eye-tracking interaction is the key control method for head-mounted virtual reality (VR)/augmented reality (AR) devices and non-calibrated gaze estimation is one of the core problem in current VR/AR eye-tracking interactions. Efficient and robust non-calibrated gaze estimation requires a large amount of training data and an efficient network structure. Based on the existing deep-learning-based near-eye gaze estimation method, by adding multitasking auxiliary reasoning and increasing the multi-stage output of the network structure for joint multi-task training, we achieve an effective improvement of gaze estimation accuracy without increasing the refer time compared to the original gaze estimation network. During model training, multiple intermediate stages of the gaze estimation network structure are used to derive multiple parallel network headers for auxiliary reasoning about eye features, including semantic segmentation of eye images, iris border frames, and eye contour information, to provide multi-stage relay monitoring for the original gaze estimation network, which effectively improves the generalization capability of the gaze estimation network without increasing the training data. Experiments on the open datasets Acomo-14 and OpenEDS2020 show that the accuracy of the algorithm is improved by 21.74% and 18.91%, respectively, and the average gaze estimation error is reduced to 1.38 degrees and 2.01 degrees, compared with the network without auxiliary reasoning.
- gaze estimation /
- augmented reality (AR) /
- human-computer interaction /
- multitask learning /
- auxiliary reasoning

HTML全文

图 1 ARGazeNet网络结构

Figure 1. Network architecture of ARGazeNet

下载: 全尺寸图片幻灯片

图 2 ARGazeNet的架构细节

Figure 2. Architecture details of ARGazeNet

下载: 全尺寸图片幻灯片

图 3 用于对比实验的Kim模型

Figure 3. Kim model for comparison experiments

下载: 全尺寸图片幻灯片

表 1 数据集Acomo-14和openEDS2020上跨主体评估的平均角度误差

Table 1. Mean angular error in cross-subjects on Acomo-14 and openEDS2020 datasets

模型	平均角度误差/(°)
模型	Acomo-14	openEDS2020
Kim模型^[17]	2.51	3.51
不含辅助推理模块的ARGazeNet	1.68	2.39
ARGazeNet	1.38	2.01
使用分割图作为输入的ARGazeNet	5.94	4.95

下载: 导出CSV

表 2 测试硬件上的推理性能

Table 2. Inference performance on tested hardware

模型	总耗时/ms
Kim模型	2.34
不含辅助推理模块的ARGazeNet	2.47
ARGazeNet	2.47

下载: 导出CSV

表 3 辅助推理模块的影响

Table 3. Effectiveness of auxiliary reasoning module

模型	平均角度误差/(°)
模型	Acomo-14	openEDS2020
仅使用基础CNN模型	1.68	2.39
添加眼部语义分割图预测模块	1.60	2.20
添加虹膜边界框预测模块	1.46	2.30
添加眼部关键轮廓图预测模块	1.63	2.32
添加眼部语义分割图预测模块与虹膜边界框预测模块	1.41	2.15
添加眼部语义分割图预测模块与眼部关键轮廓图预测模块	1.54	2.18
添加虹膜边界框预测模块与眼部关键轮廓图预测模块	1.43	2.26
ARGazeNet	1.38	2.01

下载: 导出CSV

参考文献(21)

[1]	KYTÖ M, ENS B, PIUMSOMBOON T, et al. Pinpointing: Precise head-and eye-based target selection for augmented reality[C]//Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 2018: 1-14.
[2]	SMITH P, SHAH M, DA VITORIA LOBO N. Determining driver visual attention with one camera[J]. IEEE Transactions on Intelligent Transportation Systems, 2003, 4(4): 205-218. doi: 10.1109/TITS.2003.821342
[3]	ALBERT R, PATNEY A, LUEBKE D, et al. Latency requirements for foveated rendering in virtual reality[J]. ACM Transactions on Applied Perception, 2017, 14(4): 1-13.
[4]	PATNEY A, SALVI M, KIM J, et al. Towards foveated rendering for gaze-tracked virtual reality[J]. ACM Transactions on Graphics, 2016, 35(6): 1-12.
[5]	ZHANG X, SUGANO Y, BULLING A. Evaluation of appearance-based methods and implications for gaze-based applications[C]//Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019: 1-13.
[6]	ZHANG X, SUGANO Y, FRITZ M, et al. Appearance-based gaze estimation in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 4511-4520.
[7]	ZHANG X, SUGANO Y, FRITZ M, et al. MPⅡGaze: Real-world dataset and deep appearance-based gaze estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 41(1): 162-175.
[8]	CHENG Y, ZHANG X, LU F, et al. Gaze estimation by exploring two-eye ssymmetry[J]. IEEE Transactions on Image Processing, 2020, 29: 5259-5272. doi: 10.1109/TIP.2020.2982828
[9]	KRAFKA K, KHOSLA A, KELLNHOFER P, et al. Eye tracking for everyone[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 2176-2184.
[10]	PARK S, SPURR A, HILLIGES O. Deep pictorial gaze estimation[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2018: 721-738.
[11]	PARK S, ZHANG X, BULLING A, et al. Learning to find eye region landmarks for remote gaze estimation in unconstrained settings[C]//Proceedings of the 2018 ACM Symposium on Eye Tracking Research and Applications. New York: ACM, 2018: 1-10.
[12]	YU Y, LIU G, ODOBEZ J M. Deep multitask gaze estimation with a constrained landmark-gaze model[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2018: 456-474.
[13]	GARBIN S J, SHEN Y, SCHUETZ I, et al. OpenEDS: Open eye dataset[EB/OL]. (2019-05-17)[2020-12-01]. https://arxiv.org/abs/1905.03702.
[14]	PALMERO C, SHARMA A, BEHRENDT K, et al. OpenEDS2020: Open eyes dataset[EB/OL]. (2020-05-08)[2020-12-01]. https://arxiv.org/abs/2005.03876.
[15]	CHAUDHARY A K, KOTHARI R, ACHARYA M, et al. RITnet: Real-time semantic segmentation of the eye for gaze tracking[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Piscataway: IEEE Press, 2019: 3698-3702.
[16]	PALMERO C, KOMOGORTSEV O V, TALATHI S S. Benefits of temporal information for appearance-based gaze estimation[C]//Proceedings of the 2020 ACM Symposium on Eye Tracking Research and Applications. New York: ACM, 2020: 1-5.
[17]	KIM J, STENGEL M, MAJERCIK A, et al. NVGaze: An anatomically-informed dataset for low-latency, near-eye gaze estimation[C]//Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019: 1-12.
[18]	WU Z, RAJENDRAN S, VAN AS T, et al. EyeNet: A multi-task deep network for off-axis eye gaze estimation[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Piscataway: IEEE Press, 2019: 3683-3687.
[19]	TAN M, LE Q. EfficientNet: Rethinking model scaling for convolutional neural networks[C]//International Conference on Machine Learning, 2019: 6105-6114.
[20]	QIN X, ZHANG Z, HUANG C, et al. BASNet: Boundary-aware salient object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7479-7489.
[21]	CHETLUR S, WOOLLEY C, VANDERMERSCH P, et al. cuDNN: Efficient primitives for deep learning[EB/OL]. (2014-12-18)[2020-12-01]. https://arxiv.org/abs/1410.0759.