多目视觉下基于自动变分校正的人体3D姿态检测与建模

赵晓冬; 陈凯; 黄煜杰; 王鹏飞; 王子源

doi:10.13700/j.bh.1001-5965.2024.0070

多目视觉下基于自动变分校正的人体3D姿态检测与建模

doi: 10.13700/j.bh.1001-5965.2024.0070

1.
南京航空航天大学机电学院，南京 210016
2.
中国电子科技集团第二十八研究所，南京 210007

基金项目:

国家自然科学基金(52202417)；中国博士后科学基金(2022TQ0155,2022M721605)；虚拟现实技术与系统全国重点实验室(北京航空航天大学)开放课题基金(VRLAB2023A02)；中国科协青年科技人才托举工程(2023QNRC001)；江苏省科协青年科技人才托举工程(JSTJ-2023-XH032)

详细信息

通讯作者:
E-mail：chen_kai@nuaa.edu.cn

中图分类号: TP391.4
计量
- 文章访问数: 301
- HTML全文浏览量: 103
- PDF下载量: 26
- 被引次数: 0
出版历程
- 收稿日期: 2024-01-30
- 录用日期: 2024-05-14
- 网络出版日期: 2024-06-14
- 整期出版日期: 2026-04-30

Human 3D posture detection and modeling based on automatic variational correction in multi-vision

1.
College of Mechanical and Electrical Engineering，Nanhang University，Nanjing 210016，China
2.
China Electronics Technology Group 28th Research Institute，Nanjing 210007，China

Funds:

National Natural Science Foundation of China (52202417); China Postdoctoral Science Foundation(2022TQ0155,2022M721605); Open Project Program of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University(VRLAB2023A02); Young Elite Scientists Sponsorship Program by CAST(2023QNRC001); Young Elite Scientists Sponsorship Program by JSTJ(JSTJ-2023-XH032)

More Information

Corresponding author: E-mail：chen_kai@nuaa.edu.cn

摘要

摘要:
针对现有基于2D人体表面姿态点构建人体3D模型的方法大多会导致连续建模抖动及建模结果局部扭曲等问题，提出基于蒙皮多人线性模型(SMPL)的人体内部3D姿态点检测方法，通过聚类算法将多视角下体内2D姿态点映射真实场景中3D姿态点，引入卡尔曼滤波对人体姿态点进行去噪。基于3D姿态点构建人体3D模型的过程中，基于自动变分方法对梯度下降回归网络进行校正，构建端到端的人体3D建模网络SMPL-VAE，在保持整体比例的同时，更加符合人体运动结构的局部建模。在公开数据集Shelf上进行测试，结果显示，所提方法的平均每关节位置误差(MPJPE)相比其他方法分别提升3.88、7.56和12.88，正确关键点百分比(PCK)分别提升3.5、6.91和9，针对不同目标也能准确匹配姿态点。
- 多目视觉 /
- 蒙皮多人线性模型 /
- 3D姿态检测 /
- 卡尔曼滤波 /
- 变分自编码器
Abstract:
This paper proposed a method for detecting 3D pose points inside the human body based on skin multi-person linear (SMPL) model and mapped 2D pose points inside the human body from multiple perspectives to 3D pose points in real scenes using a clustering algorithm in order to address issues such as continuous modeling jitter and local distortion of model results caused by the existing methods of constructing 3D human body models based on 2D human body surface pose points. The Kalman filter is introduced to denoise the attitude points of the human body. In the process of constructing a human 3D model based on 3D pose points, this paper corrects the gradient descent regression network based on an automatic variational method and constructs an end-to-end human 3D modeling network SMPL-VAE, which is more in line with the local modeling of human motion structure while maintaining the overall proportion. The test on the open data set Shelf revealed that the attitude points could be correctly matched for various targets, and the mean position error per joint (MPJPE) was improved by 3.88, 7.56, 12.88, respectively, compared with other methods. Additionally, the percentage of correct key points (PCK) was improved by 3.5, 6.91, and 9, respectively, compared with other methods.
- multi-vision /
- skin multi-person linear model /
- 3D pose detection /
- Kalman filter /
- variational autoencoder

HTML全文

图 1 多视角下3D姿态点检测获取框架示意图

Figure 1. Pipeline of 3D pose points detection in multi-view

下载: 全尺寸图片幻灯片

图 2 各视角下2D姿态点还原过程示意图

Figure 2. Schematic of 2D pose point reduction process in each view angle

下载: 全尺寸图片幻灯片

图 3 3D坐标回归

Figure 3. 3D coordinate regression

下载: 全尺寸图片幻灯片

图 4 卡尔曼滤波处理效果对比

Figure 4. Comparison of Kalman filter processing effect

下载: 全尺寸图片幻灯片

图 5 基于自动变分校正的人体3D建模框架

Figure 5. A framework for human body 3D modeling based on automatic variational correction

下载: 全尺寸图片幻灯片

图 6 人体姿态点检测在各视角下关键帧中的效果展示

Figure 6. Effectiveness demonstration of human body pose point detection in key frames across various perspectives

下载: 全尺寸图片幻灯片

图 7 关键帧中3D姿态检测结果

Figure 7. 3D pose detection result in key frame

下载: 全尺寸图片幻灯片

表 1 基于Shelf数据集与以往工作的比较

Table 1. Comparison with previous work based on the Shelf dataset %

方法 e_MPJPE P_CK

SMPL^[11] 107.12 86.24

SMPLify^[19] 109.30 88.76

SPEC^[20] 108.62 86.42

Shape-aware^[10] 105.62 85.35

本文 101.74 92.26

　注：加粗数值表示最优结果。

下载: 导出CSV

参考文献(32)

[1]	CAO Z, HIDALGO G, SIMON T, et al. OpenPose: realtime multi-person 2D pose estimation using part affinity fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 172-186.
[2]	ZHANG W Q, FANG J M, WANG X G, et al. EfficientPose: efficient human pose estimation with neural architecture search[J]. Computational Visual Media, 2021, 7(3): 335-347.
[3]	SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 5686-5696.
[4]	TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 1653-1660.
[5]	SUN Y, YE Y, LIU W, et al. Human mesh recovery from monocular images via a skeleton-disentangled representation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 5348-5357.
[6]	PAVLAKOS G, CHOUTAS V, GHORBANI N, et al. Expressive body capture: 3D hands, face, and body from a single image[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 10967-10977.
[7]	LIU S C, SAITO S, CHEN W K, et al. Learning to infer implicit surfaces without 3D supervision[J]. Advances in Neural Information Processing Systems, 2019, 32: 8295-8306.
[8]	FANG H S, XIE S Q, TAI Y W, et al. RMPE: Regional multi-person pose estimation[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2353-2362.
[9]	LI W B, WANG Z, YIN B Y, et al. Rethinking on multi-stage networks for human pose estimation[EB/OL]. (2019-01-01) [2023-11-25]. https://doi.org/10.48550/arXiv.1901.00148.
[10]	DONG Z J, SONG J, CHEN X, et al. Shape-aware multi-person pose estimation from multi-view images[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 11138-11148.
[11]	LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. ACM transactions on graphics, 2015, 34(6): 248.
[12]	LI Z G, OSKARSSON M, HEYDEN A. 3D human pose and shape estimation through collaborative learning and multi-view model-fitting[C]//Proceedings of the 2021 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2021: 1887-1896.
[13]	DIEDERIK P K, MAX W. Auto-encoding variational Bayes[EB/OL]. (2013-112-20) [2023-11-30]. https://doi.org/10.48550/arXiv.1312.6114.
[14]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[15]	ISHWARYA K, ALICE NITHYA A. Squirrel search optimization with deep convolutional neural network for human pose estimation[J]. Computers, Materials & Continua, 2023, 74(3): 6081-6099.
[16]	TEKIN B, KATIRCIOGLU I, SALZMANN M, et al. Structured prediction of 3D human pose with deep neural networks[EB/OL]. (2016-05-17) [2023-12-15]. https://doi.org/10.48550/arXiv.1605.05180.
[17]	MOON G, CHANG J Y, LEE K M. Camera distance-aware top-down approach for 3D multi-person pose estimation from a single rgb image[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 10132-10141.
[18]	TU H Y, WANG C Y, ZENG W G. VoxelPose: towards multi-camera 3D human pose estimation in wild environment[C]//Computer Vision–ECCV 2020. Berlin: Springer, 2020: 197-212.
[19]	BOGO F, KANAZAWA A, LASSNER C, et al. Keep it SMPL: automatic estimation of 3D human pose and shape from a single image[C]//Computer Vision–ECCV. Berlin: Springer, 2016: 561-578.
[20]	KOCABAS M, HUANG C P, TESCH J, et al. SPEC: seeing people in the wild with an estimated camera[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 11015-11025.
[21]	KANAZAWA A, BLACK M J, JACOBS D W, et al. End-to-end recovery of human shape and pose[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018 : 7122 -7131.
[22]	KWON O H, TANKE J, GALL J. Recursive Bayesian filtering for multiple human pose tracking from multiple cameras[C]//Computer Vision–ACCV 2020. Berlin: Springer, 2021: 438-453.
[23]	黄煜杰, 陈凯, 王子源, 等. 多目视觉下基于融合特征的密集行人跟踪方法[J]. 北京航空航天大学学报, 2025, 51(7): 2513-2525. HUANG Y J, CHEN K, WANG Z Y, et al. A dense pedestrian tracking method based on fusion features under multi-vision[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(7): 2513-2525(in Chinese).
[24]	FIERARU M, ZANFIR M, SZENTE T, et al. Remips: physically consistent 3d reconstruction of multiple interacting people under weak supervision[J]. Advances in Neural Information Processing Systems, 2021, 34: 19385-19397.
[25]	TULSIANI S, EFROS A A, MALIK J. Multi-view consistency as supervisory signal for learning shape and pose prediction[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2897-2905.
[26]	KE S R, ZHU L J, HWANG J N, et al. Real-time 3D human pose estimation from monocular view with applications to event detection and video gaming[C]//Proceedings of the 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway: IEEE Press, 2010: 489-496.
[27]	史加荣, 王丹, 尚凡华, 等. 随机梯度下降算法研究进展[J]. 自动化学报, 2021, 47(9): 2103-2119. SHI J R, WANG D, SHANG F H, et al. Research advances on stochastic gradient descent algorithms[J]. Acta Automatica Sinica, 2021, 47(9): 2103-2119(in Chinese).
[28]	WEI L H, ZHENG C, HU Y J. Oriented object detection in aerial images based on the scaled smooth L1 loss function[J]. Remote Sensing, 2023, 15(5): 1350.
[29]	RAMAKRISHNA V, MUNOZ D, HEBERT M, et al. Pose machines: Articulated pose estimation via inference machines[C]//Computer Vision-ECCV 2014. Berlin: Springer, 2014: 33-47.
[30]	BELAGIANNIS V, AMIN S, ANDRILUKA M, et al. 3D pictorial structures for multiple human pose estimation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 1669-1676.
[31]	TIAN W, GAO Z, TAN D Y. Single-view multi-human pose estimation by attentive cross-dimension matching[J]. Frontiers in Neuroscience, 2023, 17: 1201088.
[32]	RANI C J, DEVARAKONDA N, KUMARI K W S N, et al. A monadic and effective frame work for single human pose estimation of 2D images and videos[C]//Proceedings of the Second International Conference on Image Processing and Capsule Networks. Berlin: Springer, 2022: 254-268.

施引文献

资源附件(0)

访问统计

点击查看大图

图(7) / 表(1)

计量

文章访问数: 301
HTML全文浏览量: 103
PDF下载量: 26
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

多目视觉下基于自动变分校正的人体3D姿态检测与建模

doi: 10.13700/j.bh.1001-5965.2024.0070

通讯作者:
E-mail：chen_kai@nuaa.edu.cn

计量

Human 3D posture detection and modeling based on automatic variational correction in multi-vision

Corresponding author: E-mail：chen_kai@nuaa.edu.cn

计量

目录

方法	e_MPJPE	P_CK
SMPL^[11]	107.12	86.24
SMPLify^[19]	109.30	88.76
SPEC^[20]	108.62	86.42
Shape-aware^[10]	105.62	85.35
本文	101.74	92.26
注：加粗数值表示最优结果。

留言板

多目视觉下基于自动变分校正的人体3D姿态检测与建模

doi: 10.13700/j.bh.1001-5965.2024.0070

通讯作者: E-mail：chen_kai@nuaa.edu.cn

计量

出版历程

Human 3D posture detection and modeling based on automatic variational correction in multi-vision

Corresponding author: E-mail：chen_kai@nuaa.edu.cn

计量

出版历程

目录

通讯作者:
E-mail：chen_kai@nuaa.edu.cn