北京航空航天大学学报 ›› 2019, Vol. 45 ›› Issue (12): 2375-2384.doi: 10.13700/j.bh.1001-5965.2019.0387

• 论文 • 上一篇    下一篇

基于多源图像弱监督学习的3D人体姿态估计

蔡轶珩, 王雪艳, 胡绍斌, 刘嘉琦   

  1. 北京工业大学 信息学部, 北京 100124
  • 收稿日期:2019-07-10 出版日期:2019-12-20 发布日期:2019-12-31
  • 通讯作者: 蔡轶珩 E-mail:caiyiheng@bjut.edu.cn
  • 作者简介:蔡轶珩 女,博士,副教授。主要研究方向:图像处理与识别、视觉感知信息处理、颜色科学;王雪艳 女,硕士研究生。主要研究方向:图像与视频处理;胡绍斌 男,硕士研究生。主要研究方向:图像处理;刘嘉琦 男,硕士研究生。主要研究方向:图像与视频处理与分析。
  • 基金资助:
    国家重点研发计划(2017YFC1703302);北京市教委科技项目(KM201710005028)

Three-dimensional human pose estimation based on multi-source image weakly-supervised learning

CAI Yiheng, WANG Xueyan, HU Shaobin, LIU Jiaqi   

  1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
  • Received:2019-07-10 Online:2019-12-20 Published:2019-12-31
  • Supported by:
    National Key R & D Project of China (2017YFC1703302); Science and Technology Plan Projects of Beijing Municipal Education Commission of China (KM201710005028)

摘要: 3D人体姿态估计是计算机视觉领域一大研究热点,针对深度图像缺乏深度标签,以及因姿态单一造成的模型泛化能力不高的问题,创新性地提出了基于多源图像弱监督学习的3D人体姿态估计方法。首先,利用多源图像融合训练的方法,提高模型的泛化能力;然后,提出弱监督学习方法解决标签不足的问题;最后,为了提高姿态估计的效果,改进了残差模块的设计。实验结果表明:改善的网络结构在训练时间下降约28%的情况下,准确率提高0.2%,并且所提方法不管是在深度图像还是彩色图像上,均达到了较好的估计结果。

关键词: 人体姿态估计, 沙漏网络, 弱监督, 多源图像, 深度图像

Abstract: Three-dimensional human pose estimation is a hot research topic in the field of computer vision. Aimed at the lack of labels in depth images and the low generalization ability of models caused by single human pose, this paper innovatively proposes a method of 3D human pose estimation based on multi-source image weakly-supervised learning. This method mainly includes the following points. First, multi-source image fusion training method is used to improve the generalization ability of the model. Second, weakly-supervised learning approach is proposed to solve the problem of label insufficiency. Third, in order to improve the attitude estimation results, this paper improve the design of the residual module. The experimental results show that the regression accuracy from our improved network increases by 0.2%, and meanwhile the training time reduces by 28% compared with the original network. In a word, the proposed method obtains excellent estimation results with both depth images and color images.

Key words: human pose estimation, hourglass networks, weakly-supervised, multi-source image, depth image

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发