留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多源图像弱监督学习的3D人体姿态估计

蔡轶珩 王雪艳 胡绍斌 刘嘉琦

蔡轶珩, 王雪艳, 胡绍斌, 等 . 基于多源图像弱监督学习的3D人体姿态估计[J]. 北京航空航天大学学报, 2019, 45(12): 2375-2384. doi: 10.13700/j.bh.1001-5965.2019.0387
引用本文: 蔡轶珩, 王雪艳, 胡绍斌, 等 . 基于多源图像弱监督学习的3D人体姿态估计[J]. 北京航空航天大学学报, 2019, 45(12): 2375-2384. doi: 10.13700/j.bh.1001-5965.2019.0387
CAI Yiheng, WANG Xueyan, HU Shaobin, et al. Three-dimensional human pose estimation based on multi-source image weakly-supervised learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(12): 2375-2384. doi: 10.13700/j.bh.1001-5965.2019.0387(in Chinese)
Citation: CAI Yiheng, WANG Xueyan, HU Shaobin, et al. Three-dimensional human pose estimation based on multi-source image weakly-supervised learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(12): 2375-2384. doi: 10.13700/j.bh.1001-5965.2019.0387(in Chinese)

基于多源图像弱监督学习的3D人体姿态估计

doi: 10.13700/j.bh.1001-5965.2019.0387
基金项目: 

国家重点研发计划 2017YFC1703302

北京市教委科技项目 KM201710005028

详细信息
    作者简介:

    蔡轶珩   女, 博士, 副教授。主要研究方向:图像处理与识别、视觉感知信息处理、颜色科学

    王雪艳   女, 硕士研究生。主要研究方向:图像与视频处理

    胡绍斌   男, 硕士研究生。主要研究方向:图像处理

    刘嘉琦  男, 硕士研究生。主要研究方向:图像与视频处理与分析

    通讯作者:

    蔡轶珩. E-mail: caiyiheng@bjut.edu.cn

  • 中图分类号: TP391.4

Three-dimensional human pose estimation based on multi-source image weakly-supervised learning

Funds: 

National Key R & D Project of China 2017YFC1703302

Science and Technology Plan Projects of Beijing Municipal Education Commission of China KM201710005028

More Information
  • 摘要:

    3D人体姿态估计是计算机视觉领域一大研究热点,针对深度图像缺乏深度标签,以及因姿态单一造成的模型泛化能力不高的问题,创新性地提出了基于多源图像弱监督学习的3D人体姿态估计方法。首先,利用多源图像融合训练的方法,提高模型的泛化能力;然后,提出弱监督学习方法解决标签不足的问题;最后,为了提高姿态估计的效果,改进了残差模块的设计。实验结果表明:改善的网络结构在训练时间下降约28%的情况下,准确率提高0.2%,并且所提方法不管是在深度图像还是彩色图像上,均达到了较好的估计结果。

     

  • 图 1  3D人体姿态估计整体框架

    Figure 1.  Overall framework of 3D human pose estimation

    图 2  基于弱监督学习网络结构框架

    Figure 2.  Network structure framework based on weakly-supervised learning

    图 3  残差模块

    Figure 3.  Residual module

    图 4  ITOP数据集的2D人体姿态估计及其对应的热图结果

    Figure 4.  Two-dimensional human pose estimation and corresponding heat-map results in ITOP dataset

    图 5  基于PDJ评价指标,不同训练模型在ITOP数据集手腕和膝盖3D关节点的准确率

    Figure 5.  Three-dimensional articulation point accurary rate of wrist and knee using different training models based on PDJ evaluation criteria in ITOP database

    图 6  基于PDJ评价指标,不同训练模型在Human 3.6M数据集脚踝和膝盖3D关节点的准确率

    Figure 6.  Three-dimensional articulation point accurary rate of ankle and knee using different training models based on PDJ evaluation criteria in Human 3.6M database

    图 7  ITOP数据集上的3D人体姿态估计

    Figure 7.  Three-dimensional human pose estimation on ITOP dataset

    图 8  Human 3.6M数据集上的3D人体姿态估计

    Figure 8.  Three-dimensional human pose estimation on Human 3.6M dataset

    表  1  不同模型对应的训练图像

    Table  1.   Training images corresponding to different models

    模型 训练数据
    深度图像数据库 彩色图像数据库
    ITOP K2HGD MPII Human 3.6M
    文献[13]
    M-H36M(本文)
    I-H36M(本文)
    IK-H36M(本文)
    IKM-H36M(本文)
    下载: 导出CSV

    表  2  不同模型准确率、参数量及训练时间对比

    Table  2.   Comparison of accuracy rate, parameter quantity and training time among different models

    模型 准确率/ % 参数量/ 106 训练时间/ (s·batch-1)
    本文模型(131~128) 90.10 31.00 0.16
    文献[13] (131~256) 92.26 116.60 0.29
    本文模型(333~128) 92.48 101.10 0.21
    本文模型(333~256) 92.92 390.00 0.41
    下载: 导出CSV

    表  3  不同沙漏网络个数准确率、参数量及训练时间对比

    Table  3.   Comparison of accuracy rate, parameter quantity and training time with different numbers of hourglass network

    模型 准确率/ % 参数量/ 106 训练时间/ (s·batch-1)
    文献[13] (2 stack) 92.26 116.60 0.29
    本文模型(2 stack) 92.48 101.10 0.21
    本文模型(4 stack) 92.83 185.30 0.32
    下载: 导出CSV

    表  4  基于PDJ评价指标,不同训练模型在Human 3.6M测试图像上的3D人体姿态估计结果

    Table  4.   Three-dimensional pose estimation results of different regression models on Human 3.6M test images base on based on PDJ evaluation criteria

    方法 准确率/%
    脚踝 膝盖 手腕 手肘 肩膀 平均
    文献[13] 65.00 82.90 89.27 77.91 86.46 94.73 94.42 84.38
    M-H36M(本文) 75.07 90.33 94.69 80.92 86.24 96.38 94.69 88.33
    I-H36M(本文) 71.52 88.03 92.52 76.46 82.55 94.07 95.30 85.78
    IK-H36M(本文) 79.23 91.60 94.40 77.61 81.40 92.69 93.34 87.18
    IKM-H36M(本文) 74.98 91.06 94.33 78.65 85.21 95.77 95.16 87.88
    下载: 导出CSV
  • [1] PARK S, HWANG J, KWAK N.3D human pose estimation using convolutional neural networks with 2D pose information[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 156-169.
    [2] YANG W, OUYANG W, LI H, et al.End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation[C]//IEEE Computer Society Conference on Computer Vision and Patter Recognition.Piscataway, NJ: IEEE Press, 2016: 3073-3082. https://www.researchgate.net/publication/311610573_End-to-End_Learning_of_Deformable_Mixture_of_Parts_and_Deep_Convolutional_Neural_Networks_for_Human_Pose_Estimation
    [3] ZE W K, FU Z S, HUI C, et al.Human pose estimation from depth images via inference embedded multi-task learning[C]//Proceedings of the 2016 ACM on Multimedia Conference.New York: ACM, 2016: 1227-1236. https://www.researchgate.net/publication/310819871_Human_Pose_Estimation_from_Depth_Images_via_Inference_Embedded_Multi-task_Learning
    [4] SHEN W, DENG K, BAI X, et al.Exemplar-based human action pose correction[J].IEEE Transactions on Cybernetics, 2014, 44(7):1053-1066. doi: 10.1109/TCYB.2013.2279071
    [5] GULER R A, KOKKINOS L, NEVEROVA N, et al.DensePose: Dense human pose estimation in the wild[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2018: 7297-7306. https://www.researchgate.net/publication/329754451_DensePose_Dense_Human_Pose_Estimation_in_the_Wild
    [6] RHODIN H, SALZMANN M, FUA P.Unsupervised geometry-aware representation for 3D human pose estimation[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 765-782. doi: 10.1007/978-3-030-01249-6_46
    [7] OMRAN M, LASSNER C, PONS-MOLL G, et al.Neural body fitting: Unifying deep learning and model based human pose and shape estimation[C]//International Conference on 3D Vision.Piscataway, NJ: IEEE Press, 2018: 484-494.
    [8] HAQUE A, PENG B, LUO Z, et al.Towards viewpoint invariant 3D human pose estimation[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 160-177. doi: 10.1007%2F978-3-319-46448-0_10
    [9] TOSHEV A, SZEGEDY C.DeepPose: Human pose estimation via deep neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2014: 1653-1660. https://www.researchgate.net/publication/259335300_DeepPose_Human_Pose_Estimation_via_Deep_Neural_Networks
    [10] CAO Z, SIMON T, WEI S E, et al.Realtime multi-person 2D pose estimation using part affinity fields[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2017: 7291-7299. https://www.researchgate.net/publication/310953055_Realtime_Multi-Person_2D_Pose_Estimation_using_Part_Affinity_Fields
    [11] WEI S E, RAMAKRISHNA V, KANADE T, et al.Convolutional pose machines[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2016: 4724-4732. https://www.researchgate.net/publication/319770228_Convolutional_Pose_Machines?ev=auth_pub
    [12] NEWELL A, YANG K, DENG J, et al.Stacked hourglass networks for human pose estimation[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 483-499. doi: 10.1007%2F978-3-319-46484-8_29
    [13] YI Z X, XING H Q, XIAO S, et al.Towards 3D pose estimation in the wild: A weakly-supervised approach[C]//IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2017: 398-407. https://www.researchgate.net/publication/322060193_Towards_3D_Human_Pose_Estimation_in_the_Wild_A_Weakly-Supervised_Approach?ev=auth_pub
    [14] SAM J, MARK E.Clustered pose and nonlinear appearance models for human pose estimation[C]//Proceedings of the 21st British Machine Vision Conference, 2010: 12.1-12.11.
    [15] ANDRILUKA M, PISHCHULIN L, GEHLER P, et al.2D human pose estimation: New benchmark and state of the art analysis[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2014: 3686-3693. https://www.researchgate.net/publication/269332682_2D_Human_Pose_Estimation_New_Benchmark_and_State_of_the_Art_Analysis
    [16] CTALIN I, DRAGOS P, VLAD O, et al.Human 3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1325-1339. https://www.ncbi.nlm.nih.gov/pubmed/26353306
    [17] HAN X F, LEUNG T, JIA Y Q, et al.MatchNet: Unifying feature and metric learning for patch-based matching[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2015: 3279-3286.
    [18] XU G H, LI M, CHEN L T, et al.Human pose estimation method based on single depth image[J].IEEE Transactions on Computer Vision, 2018, 12(6):919-924. doi: 10.1049/iet-cvi.2017.0536
    [19] SI J L, ANTONI B.3D human pose estimation from monocular images with deep convolutional neural network[C]//Asian Conference on Computer Vision.Berlin: Springer, 2014: 332-347. doi: 10.1007/978-3-319-16808-1_23
    [20] GHEZELGHIEH M F, KASTURI R, SARKAR S.Learning camera viewpoint using CNN to improve 3D body pose estimation[C]//International Conference on 3D Vision.Piscataway, NJ: IEEE Press, 2016: 685-693. https://www.researchgate.net/publication/308349713_Learning_camera_viewpoint_using_CNN_to_improve_3D_body_pose_estimation
    [21] CHEN C H, RAMANAN D.3D human pose estimation=2D pose estimation+matching[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2017: 5759-5767. https://www.researchgate.net/publication/320968127_3D_Human_Pose_Estimation_2D_Pose_Estimation_Matching
    [22] POPA A I, ZANFIR M, SMINCHISESCU C.Deep multitask architecture for integrated 2D and 3D human sensing[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2017: 4714-4723. https://www.researchgate.net/publication/320971314_Deep_Multitask_Architecture_for_Integrated_2D_and_3D_Human_Sensing
    [23] COLLOBERT R, KAVUKCUOGLU K, FARABET C.Torch7: A Matlab-like environment for machine learning[C]//Conference and Workshop on Neural Information Processing Systems, 2011: 1-6.
    [24] NIKOS K, GEORGIOS P, KOSTAS D.Convolutional mesh regression for single-image human shape reconstruction[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2019: 4510-4519. https://www.researchgate.net/publication/332960783_Convolutional_Mesh_Regression_for_Single-Image_Human_Shape_Reconstruction
    [25] CHENXU L, XIAO C, ALAN Y.OriNet: A fully convolutional network for 3D human pose estimation[C]//British Machine Vision Conference, 2018: 321-333. https://www.researchgate.net/publication/328939243_OriNet_A_Fully_Convolutional_Network_for_3D_Human_Pose_Estimation?_sg=r3piujM19hvLkXZ06fh_A65IavXt1ylZplaAFT5xxxEkliWoBfnMJOyUqBeXT1vAQNH9dJe6XugzQptjC78z9Z9x9wxgcg
  • 加载中
图(8) / 表(4)
计量
  • 文章访问数:  741
  • HTML全文浏览量:  55
  • PDF下载量:  403
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-07-10
  • 录用日期:  2019-08-23
  • 网络出版日期:  2019-12-20

目录

    /

    返回文章
    返回
    常见问答