Human 3D posture detection and modeling based on automatic variational correction in multi-vision
-
摘要:
针对现有基于2D人体表面姿态点构建人体3D模型的方法大多会导致连续建模抖动及建模结果局部扭曲等问题,提出基于蒙皮多人线性模型(SMPL)的人体内部3D姿态点检测方法,通过聚类算法将多视角下体内2D姿态点映射真实场景中3D姿态点,引入卡尔曼滤波对人体姿态点进行去噪。基于3D姿态点构建人体3D模型的过程中,基于自动变分方法对梯度下降回归网络进行校正,构建端到端的人体3D建模网络SMPL-VAE,在保持整体比例的同时,更加符合人体运动结构的局部建模。在公开数据集Shelf上进行测试,结果显示,所提方法的平均每关节位置误差(MPJPE)相比其他方法分别提升3.88、7.56和12.88,正确关键点百分比(PCK)分别提升3.5、6.91和9,针对不同目标也能准确匹配姿态点。
Abstract:This paper proposed a method for detecting 3D pose points inside the human body based on skin multi-person linear (SMPL) model and mapped 2D pose points inside the human body from multiple perspectives to 3D pose points in real scenes using a clustering algorithm in order to address issues such as continuous modeling jitter and local distortion of model results caused by the existing methods of constructing 3D human body models based on 2D human body surface pose points. The Kalman filter is introduced to denoise the attitude points of the human body. In the process of constructing a human 3D model based on 3D pose points, this paper corrects the gradient descent regression network based on an automatic variational method and constructs an end-to-end human 3D modeling network SMPL-VAE, which is more in line with the local modeling of human motion structure while maintaining the overall proportion. The test on the open data set Shelf revealed that the attitude points could be correctly matched for various targets, and the mean position error per joint (MPJPE) was improved by 3.88, 7.56, 12.88, respectively, compared with other methods. Additionally, the percentage of correct key points (PCK) was improved by 3.5, 6.91, and 9, respectively, compared with other methods.
-
-
[1] CAO Z, HIDALGO G, SIMON T, et al. OpenPose: realtime multi-person 2D pose estimation using part affinity fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 172-186. [2] ZHANG W Q, FANG J M, WANG X G, et al. EfficientPose: efficient human pose estimation with neural architecture search[J]. Computational Visual Media, 2021, 7(3): 335-347. [3] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 5686-5696. [4] TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 1653-1660. [5] SUN Y, YE Y, LIU W, et al. Human mesh recovery from monocular images via a skeleton-disentangled representation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 5348-5357. [6] PAVLAKOS G, CHOUTAS V, GHORBANI N, et al. Expressive body capture: 3D hands, face, and body from a single image[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 10967-10977. [7] LIU S C, SAITO S, CHEN W K, et al. Learning to infer implicit surfaces without 3D supervision[J]. Advances in Neural Information Processing Systems, 2019, 32: 8295-8306. [8] FANG H S, XIE S Q, TAI Y W, et al. RMPE: Regional multi-person pose estimation[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2353-2362. [9] LI W B, WANG Z, YIN B Y, et al. Rethinking on multi-stage networks for human pose estimation[EB/OL]. (2019-01-01) [2023-11-25]. https://doi.org/10.48550/arXiv.1901.00148. [10] DONG Z J, SONG J, CHEN X, et al. Shape-aware multi-person pose estimation from multi-view images[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 11138-11148. [11] LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. ACM transactions on graphics, 2015, 34(6): 248. [12] LI Z G, OSKARSSON M, HEYDEN A. 3D human pose and shape estimation through collaborative learning and multi-view model-fitting[C]//Proceedings of the 2021 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2021: 1887-1896. [13] DIEDERIK P K, MAX W. Auto-encoding variational Bayes[EB/OL]. (2013-112-20) [2023-11-30]. https://doi.org/10.48550/arXiv.1312.6114. [14] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. [15] ISHWARYA K, ALICE NITHYA A. Squirrel search optimization with deep convolutional neural network for human pose estimation[J]. Computers, Materials & Continua, 2023, 74(3): 6081-6099. [16] TEKIN B, KATIRCIOGLU I, SALZMANN M, et al. Structured prediction of 3D human pose with deep neural networks[EB/OL]. (2016-05-17) [2023-12-15]. https://doi.org/10.48550/arXiv.1605.05180. [17] MOON G, CHANG J Y, LEE K M. Camera distance-aware top-down approach for 3D multi-person pose estimation from a single rgb image[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 10132-10141. [18] TU H Y, WANG C Y, ZENG W G. VoxelPose: towards multi-camera 3D human pose estimation in wild environment[C]//Computer Vision–ECCV 2020. Berlin: Springer, 2020: 197-212. [19] BOGO F, KANAZAWA A, LASSNER C, et al. Keep it SMPL: automatic estimation of 3D human pose and shape from a single image[C]//Computer Vision–ECCV. Berlin: Springer, 2016: 561-578. [20] KOCABAS M, HUANG C P, TESCH J, et al. SPEC: seeing people in the wild with an estimated camera[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 11015-11025. [21] KANAZAWA A, BLACK M J, JACOBS D W, et al. End-to-end recovery of human shape and pose[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018 : 7122 -7131. [22] KWON O H, TANKE J, GALL J. Recursive Bayesian filtering for multiple human pose tracking from multiple cameras[C]//Computer Vision–ACCV 2020. Berlin: Springer, 2021: 438-453. [23] 黄煜杰, 陈凯, 王子源, 等. 多目视觉下基于融合特征的密集行人跟踪方法[J]. 北京航空航天大学学报, 2025, 51(7): 2513-2525.HUANG Y J, CHEN K, WANG Z Y, et al. A dense pedestrian tracking method based on fusion features under multi-vision[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(7): 2513-2525(in Chinese). [24] FIERARU M, ZANFIR M, SZENTE T, et al. Remips: physically consistent 3d reconstruction of multiple interacting people under weak supervision[J]. Advances in Neural Information Processing Systems, 2021, 34: 19385-19397. [25] TULSIANI S, EFROS A A, MALIK J. Multi-view consistency as supervisory signal for learning shape and pose prediction[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2897-2905. [26] KE S R, ZHU L J, HWANG J N, et al. Real-time 3D human pose estimation from monocular view with applications to event detection and video gaming[C]//Proceedings of the 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway: IEEE Press, 2010: 489-496. [27] 史加荣, 王丹, 尚凡华, 等. 随机梯度下降算法研究进展[J]. 自动化学报, 2021, 47(9): 2103-2119.SHI J R, WANG D, SHANG F H, et al. Research advances on stochastic gradient descent algorithms[J]. Acta Automatica Sinica, 2021, 47(9): 2103-2119(in Chinese). [28] WEI L H, ZHENG C, HU Y J. Oriented object detection in aerial images based on the scaled smooth L1 loss function[J]. Remote Sensing, 2023, 15(5): 1350. [29] RAMAKRISHNA V, MUNOZ D, HEBERT M, et al. Pose machines: Articulated pose estimation via inference machines[C]//Computer Vision-ECCV 2014. Berlin: Springer, 2014: 33-47. [30] BELAGIANNIS V, AMIN S, ANDRILUKA M, et al. 3D pictorial structures for multiple human pose estimation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 1669-1676. [31] TIAN W, GAO Z, TAN D Y. Single-view multi-human pose estimation by attentive cross-dimension matching[J]. Frontiers in Neuroscience, 2023, 17: 1201088. [32] RANI C J, DEVARAKONDA N, KUMARI K W S N, et al. A monadic and effective frame work for single human pose estimation of 2D images and videos[C]//Proceedings of the Second International Conference on Image Processing and Capsule Networks. Berlin: Springer, 2022: 254-268. -


下载: