留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多任务辅助推理的近眼视线估计方法

王小东 谢良 闫慧炯 闫野 印二威 李卫国

王小东, 谢良, 闫慧炯, 等 . 基于多任务辅助推理的近眼视线估计方法[J]. 北京航空航天大学学报, 2022, 48(6): 1030-1037. doi: 10.13700/j.bh.1001-5965.2020.0700
引用本文: 王小东, 谢良, 闫慧炯, 等 . 基于多任务辅助推理的近眼视线估计方法[J]. 北京航空航天大学学报, 2022, 48(6): 1030-1037. doi: 10.13700/j.bh.1001-5965.2020.0700
WANG Xiaodong, XIE Liang, YAN Huijiong, et al. Near-eye gaze estimation based on multitasking auxiliary reasoning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(6): 1030-1037. doi: 10.13700/j.bh.1001-5965.2020.0700(in Chinese)
Citation: WANG Xiaodong, XIE Liang, YAN Huijiong, et al. Near-eye gaze estimation based on multitasking auxiliary reasoning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(6): 1030-1037. doi: 10.13700/j.bh.1001-5965.2020.0700(in Chinese)

基于多任务辅助推理的近眼视线估计方法

doi: 10.13700/j.bh.1001-5965.2020.0700
基金项目: 

国家自然科学基金 61901505

详细信息
    通讯作者:

    谢良, E-mail: xielnudt@gmail.com

  • 中图分类号: TP391

Near-eye gaze estimation based on multitasking auxiliary reasoning

Funds: 

National Natural Science Foundation of China 61901505

More Information
  • 摘要:

    眼动交互是头戴式虚拟现实(VR)/增强现实(AR)设备的关键操控方式, 如何进行高精度、高鲁棒性的非标定视线估计是当前VR/AR眼动交互的核心问题之一, 高效、鲁棒的非标定视线估计需要大量的眼图训练数据和高效的算法结构做支撑。在现有基于深度学习的近眼视线估计方法的基础上, 通过添加多任务辅助推理模块, 增加网络结构的多阶段输出, 进行多任务联合训练, 在不增加视线估计测试耗时的前提下, 有效提升视线估计精度。在模型训练时, 从视线估计网络结构的多个中间阶段引出多个眼部特征的辅助推理并行网络头, 包括眼动图像的语义分割、虹膜边界框及眼部轮廓信息, 为原始视线估计网络提供多阶段中继监控, 在不增加训练数据的基础上, 有效提升视线估计网络的测试精度。在国际公开数据集Acomo-14与OpenEDS2020上的验证实验表明, 与无辅助推理的网络相比, 所提方法精度分别得到了21.74%与18.91%的效果提升, 平均角度误差分别减少到1.38°与2.01°。

     

  • 图 1  ARGazeNet网络结构

    Figure 1.  Network architecture of ARGazeNet

    图 2  ARGazeNet的架构细节

    Figure 2.  Architecture details of ARGazeNet

    图 3  用于对比实验的Kim模型

    Figure 3.  Kim model for comparison experiments

    表  1  数据集Acomo-14和openEDS2020上跨主体评估的平均角度误差

    Table  1.   Mean angular error in cross-subjects on Acomo-14 and openEDS2020 datasets

    模型 平均角度误差/(°)
    Acomo-14 openEDS2020
    Kim模型[17] 2.51 3.51
    不含辅助推理模块的ARGazeNet 1.68 2.39
    ARGazeNet 1.38 2.01
    使用分割图作为输入的ARGazeNet 5.94 4.95
    下载: 导出CSV

    表  2  测试硬件上的推理性能

    Table  2.   Inference performance on tested hardware

    模型 总耗时/ms
    Kim模型 2.34
    不含辅助推理模块的ARGazeNet 2.47
    ARGazeNet 2.47
    下载: 导出CSV

    表  3  辅助推理模块的影响

    Table  3.   Effectiveness of auxiliary reasoning module

    模型 平均角度误差/(°)
    Acomo-14 openEDS2020
    仅使用基础CNN模型 1.68 2.39
    添加眼部语义分割图预测模块 1.60 2.20
    添加虹膜边界框预测模块 1.46 2.30
    添加眼部关键轮廓图预测模块 1.63 2.32
    添加眼部语义分割图预测模块与虹膜边界框预测模块 1.41 2.15
    添加眼部语义分割图预测模块与眼部关键轮廓图预测模块 1.54 2.18
    添加虹膜边界框预测模块与眼部关键轮廓图预测模块 1.43 2.26
    ARGazeNet 1.38 2.01
    下载: 导出CSV
  • [1] KYTÖ M, ENS B, PIUMSOMBOON T, et al. Pinpointing: Precise head-and eye-based target selection for augmented reality[C]//Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 2018: 1-14.
    [2] SMITH P, SHAH M, DA VITORIA LOBO N. Determining driver visual attention with one camera[J]. IEEE Transactions on Intelligent Transportation Systems, 2003, 4(4): 205-218. doi: 10.1109/TITS.2003.821342
    [3] ALBERT R, PATNEY A, LUEBKE D, et al. Latency requirements for foveated rendering in virtual reality[J]. ACM Transactions on Applied Perception, 2017, 14(4): 1-13.
    [4] PATNEY A, SALVI M, KIM J, et al. Towards foveated rendering for gaze-tracked virtual reality[J]. ACM Transactions on Graphics, 2016, 35(6): 1-12.
    [5] ZHANG X, SUGANO Y, BULLING A. Evaluation of appearance-based methods and implications for gaze-based applications[C]//Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019: 1-13.
    [6] ZHANG X, SUGANO Y, FRITZ M, et al. Appearance-based gaze estimation in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 4511-4520.
    [7] ZHANG X, SUGANO Y, FRITZ M, et al. MPⅡGaze: Real-world dataset and deep appearance-based gaze estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 41(1): 162-175.
    [8] CHENG Y, ZHANG X, LU F, et al. Gaze estimation by exploring two-eye ssymmetry[J]. IEEE Transactions on Image Processing, 2020, 29: 5259-5272. doi: 10.1109/TIP.2020.2982828
    [9] KRAFKA K, KHOSLA A, KELLNHOFER P, et al. Eye tracking for everyone[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 2176-2184.
    [10] PARK S, SPURR A, HILLIGES O. Deep pictorial gaze estimation[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2018: 721-738.
    [11] PARK S, ZHANG X, BULLING A, et al. Learning to find eye region landmarks for remote gaze estimation in unconstrained settings[C]//Proceedings of the 2018 ACM Symposium on Eye Tracking Research and Applications. New York: ACM, 2018: 1-10.
    [12] YU Y, LIU G, ODOBEZ J M. Deep multitask gaze estimation with a constrained landmark-gaze model[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2018: 456-474.
    [13] GARBIN S J, SHEN Y, SCHUETZ I, et al. OpenEDS: Open eye dataset[EB/OL]. (2019-05-17)[2020-12-01]. https://arxiv.org/abs/1905.03702.
    [14] PALMERO C, SHARMA A, BEHRENDT K, et al. OpenEDS2020: Open eyes dataset[EB/OL]. (2020-05-08)[2020-12-01]. https://arxiv.org/abs/2005.03876.
    [15] CHAUDHARY A K, KOTHARI R, ACHARYA M, et al. RITnet: Real-time semantic segmentation of the eye for gaze tracking[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Piscataway: IEEE Press, 2019: 3698-3702.
    [16] PALMERO C, KOMOGORTSEV O V, TALATHI S S. Benefits of temporal information for appearance-based gaze estimation[C]//Proceedings of the 2020 ACM Symposium on Eye Tracking Research and Applications. New York: ACM, 2020: 1-5.
    [17] KIM J, STENGEL M, MAJERCIK A, et al. NVGaze: An anatomically-informed dataset for low-latency, near-eye gaze estimation[C]//Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019: 1-12.
    [18] WU Z, RAJENDRAN S, VAN AS T, et al. EyeNet: A multi-task deep network for off-axis eye gaze estimation[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Piscataway: IEEE Press, 2019: 3683-3687.
    [19] TAN M, LE Q. EfficientNet: Rethinking model scaling for convolutional neural networks[C]//International Conference on Machine Learning, 2019: 6105-6114.
    [20] QIN X, ZHANG Z, HUANG C, et al. BASNet: Boundary-aware salient object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7479-7489.
    [21] CHETLUR S, WOOLLEY C, VANDERMERSCH P, et al. cuDNN: Efficient primitives for deep learning[EB/OL]. (2014-12-18)[2020-12-01]. https://arxiv.org/abs/1410.0759.
  • 加载中
图(3) / 表(3)
计量
  • 文章访问数:  285
  • HTML全文浏览量:  84
  • PDF下载量:  38
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-12-18
  • 录用日期:  2021-04-09
  • 网络出版日期:  2022-06-20
  • 整期出版日期:  2022-06-20

目录

    /

    返回文章
    返回
    常见问答