留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

多目视觉下基于自动变分校正的人体3D姿态检测与建模

赵晓冬 陈凯 黄煜杰 王鹏飞 王子源

赵晓冬,陈凯,黄煜杰,等. 多目视觉下基于自动变分校正的人体3D姿态检测与建模[J]. 北京航空航天大学学报,2026,52(4):1290-1299
引用本文: 赵晓冬,陈凯,黄煜杰,等. 多目视觉下基于自动变分校正的人体3D姿态检测与建模[J]. 北京航空航天大学学报,2026,52(4):1290-1299
ZHAO X D,CHEN K,HUANG Y J,et al. Human 3D posture detection and modeling based on automatic variational correction in multi-vision[J]. Journal of Beijing University of Aeronautics and Astronautics,2026,52(4):1290-1299 (in Chinese)
Citation: ZHAO X D,CHEN K,HUANG Y J,et al. Human 3D posture detection and modeling based on automatic variational correction in multi-vision[J]. Journal of Beijing University of Aeronautics and Astronautics,2026,52(4):1290-1299 (in Chinese)

多目视觉下基于自动变分校正的人体3D姿态检测与建模

doi: 10.13700/j.bh.1001-5965.2024.0070
基金项目: 

国家自然科学基金(52202417); 中国博士后科学基金(2022TQ0155,2022M721605);虚拟现实技术与系统全国重点实验室(北京航空航天大学)开放课题基金(VRLAB2023A02);中国科协青年科技人才托举工程(2023QNRC001);江苏省科协青年科技人才托举工程(JSTJ-2023-XH032)

详细信息
    通讯作者:

    E-mail:chen_kai@nuaa.edu.cn

  • 中图分类号: TP391.4

Human 3D posture detection and modeling based on automatic variational correction in multi-vision

Funds: 

National Natural Science Foundation of China (52202417); China Postdoctoral Science Foundation(2022TQ0155,2022M721605); Open Project Program of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University(VRLAB2023A02); Young Elite Scientists Sponsorship Program by CAST(2023QNRC001); Young Elite Scientists Sponsorship Program by JSTJ(JSTJ-2023-XH032)

More Information
  • 摘要:

    针对现有基于2D人体表面姿态点构建人体3D模型的方法大多会导致连续建模抖动及建模结果局部扭曲等问题,提出基于蒙皮多人线性模型(SMPL)的人体内部3D姿态点检测方法,通过聚类算法将多视角下体内2D姿态点映射真实场景中3D姿态点,引入卡尔曼滤波对人体姿态点进行去噪。基于3D姿态点构建人体3D模型的过程中,基于自动变分方法对梯度下降回归网络进行校正,构建端到端的人体3D建模网络SMPL-VAE,在保持整体比例的同时,更加符合人体运动结构的局部建模。在公开数据集Shelf上进行测试,结果显示,所提方法的平均每关节位置误差(MPJPE)相比其他方法分别提升3.88、7.56和12.88,正确关键点百分比(PCK)分别提升3.5、6.91和9,针对不同目标也能准确匹配姿态点。

     

  • 图 1  多视角下3D姿态点检测获取框架示意图

    Figure 1.  Pipeline of 3D pose points detection in multi-view

    图 2  各视角下2D姿态点还原过程示意图

    Figure 2.  Schematic of 2D pose point reduction process in each view angle

    图 3  3D坐标回归

    Figure 3.  3D coordinate regression

    图 4  卡尔曼滤波处理效果对比

    Figure 4.  Comparison of Kalman filter processing effect

    图 5  基于自动变分校正的人体3D建模框架

    Figure 5.  A framework for human body 3D modeling based on automatic variational correction

    图 6  人体姿态点检测在各视角下关键帧中的效果展示

    Figure 6.  Effectiveness demonstration of human body pose point detection in key frames across various perspectives

    图 7  关键帧中3D姿态检测结果

    Figure 7.  3D pose detection result in key frame

    表  1  基于Shelf数据集与以往工作的比较

    Table  1.   Comparison with previous work based on the Shelf dataset %

    方法 eMPJPE PCK
    SMPL[11] 107.12 86.24
    SMPLify[19] 109.30 88.76
    SPEC[20] 108.62 86.42
    Shape-aware[10] 105.62 85.35
    本文 101.74 92.26
     注:加粗数值表示最优结果。
    下载: 导出CSV
  • [1] CAO Z, HIDALGO G, SIMON T, et al. OpenPose: realtime multi-person 2D pose estimation using part affinity fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 172-186.
    [2] ZHANG W Q, FANG J M, WANG X G, et al. EfficientPose: efficient human pose estimation with neural architecture search[J]. Computational Visual Media, 2021, 7(3): 335-347.
    [3] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 5686-5696.
    [4] TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 1653-1660.
    [5] SUN Y, YE Y, LIU W, et al. Human mesh recovery from monocular images via a skeleton-disentangled representation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 5348-5357.
    [6] PAVLAKOS G, CHOUTAS V, GHORBANI N, et al. Expressive body capture: 3D hands, face, and body from a single image[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 10967-10977.
    [7] LIU S C, SAITO S, CHEN W K, et al. Learning to infer implicit surfaces without 3D supervision[J]. Advances in Neural Information Processing Systems, 2019, 32: 8295-8306.
    [8] FANG H S, XIE S Q, TAI Y W, et al. RMPE: Regional multi-person pose estimation[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2353-2362.
    [9] LI W B, WANG Z, YIN B Y, et al. Rethinking on multi-stage networks for human pose estimation[EB/OL]. (2019-01-01) [2023-11-25]. https://doi.org/10.48550/arXiv.1901.00148.
    [10] DONG Z J, SONG J, CHEN X, et al. Shape-aware multi-person pose estimation from multi-view images[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 11138-11148.
    [11] LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. ACM transactions on graphics, 2015, 34(6): 248.
    [12] LI Z G, OSKARSSON M, HEYDEN A. 3D human pose and shape estimation through collaborative learning and multi-view model-fitting[C]//Proceedings of the 2021 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2021: 1887-1896.
    [13] DIEDERIK P K, MAX W. Auto-encoding variational Bayes[EB/OL]. (2013-112-20) [2023-11-30]. https://doi.org/10.48550/arXiv.1312.6114.
    [14] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
    [15] ISHWARYA K, ALICE NITHYA A. Squirrel search optimization with deep convolutional neural network for human pose estimation[J]. Computers, Materials & Continua, 2023, 74(3): 6081-6099.
    [16] TEKIN B, KATIRCIOGLU I, SALZMANN M, et al. Structured prediction of 3D human pose with deep neural networks[EB/OL]. (2016-05-17) [2023-12-15]. https://doi.org/10.48550/arXiv.1605.05180.
    [17] MOON G, CHANG J Y, LEE K M. Camera distance-aware top-down approach for 3D multi-person pose estimation from a single rgb image[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 10132-10141.
    [18] TU H Y, WANG C Y, ZENG W G. VoxelPose: towards multi-camera 3D human pose estimation in wild environment[C]//Computer Vision–ECCV 2020. Berlin: Springer, 2020: 197-212.
    [19] BOGO F, KANAZAWA A, LASSNER C, et al. Keep it SMPL: automatic estimation of 3D human pose and shape from a single image[C]//Computer Vision–ECCV. Berlin: Springer, 2016: 561-578.
    [20] KOCABAS M, HUANG C P, TESCH J, et al. SPEC: seeing people in the wild with an estimated camera[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 11015-11025.
    [21] KANAZAWA A, BLACK M J, JACOBS D W, et al. End-to-end recovery of human shape and pose[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018 : 7122 -7131.
    [22] KWON O H, TANKE J, GALL J. Recursive Bayesian filtering for multiple human pose tracking from multiple cameras[C]//Computer Vision–ACCV 2020. Berlin: Springer, 2021: 438-453.
    [23] 黄煜杰, 陈凯, 王子源, 等. 多目视觉下基于融合特征的密集行人跟踪方法[J]. 北京航空航天大学学报, 2025, 51(7): 2513-2525.

    HUANG Y J, CHEN K, WANG Z Y, et al. A dense pedestrian tracking method based on fusion features under multi-vision[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51(7): 2513-2525(in Chinese).
    [24] FIERARU M, ZANFIR M, SZENTE T, et al. Remips: physically consistent 3d reconstruction of multiple interacting people under weak supervision[J]. Advances in Neural Information Processing Systems, 2021, 34: 19385-19397.
    [25] TULSIANI S, EFROS A A, MALIK J. Multi-view consistency as supervisory signal for learning shape and pose prediction[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2897-2905.
    [26] KE S R, ZHU L J, HWANG J N, et al. Real-time 3D human pose estimation from monocular view with applications to event detection and video gaming[C]//Proceedings of the 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway: IEEE Press, 2010: 489-496.
    [27] 史加荣, 王丹, 尚凡华, 等. 随机梯度下降算法研究进展[J]. 自动化学报, 2021, 47(9): 2103-2119.

    SHI J R, WANG D, SHANG F H, et al. Research advances on stochastic gradient descent algorithms[J]. Acta Automatica Sinica, 2021, 47(9): 2103-2119(in Chinese).
    [28] WEI L H, ZHENG C, HU Y J. Oriented object detection in aerial images based on the scaled smooth L1 loss function[J]. Remote Sensing, 2023, 15(5): 1350.
    [29] RAMAKRISHNA V, MUNOZ D, HEBERT M, et al. Pose machines: Articulated pose estimation via inference machines[C]//Computer Vision-ECCV 2014. Berlin: Springer, 2014: 33-47.
    [30] BELAGIANNIS V, AMIN S, ANDRILUKA M, et al. 3D pictorial structures for multiple human pose estimation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 1669-1676.
    [31] TIAN W, GAO Z, TAN D Y. Single-view multi-human pose estimation by attentive cross-dimension matching[J]. Frontiers in Neuroscience, 2023, 17: 1201088.
    [32] RANI C J, DEVARAKONDA N, KUMARI K W S N, et al. A monadic and effective frame work for single human pose estimation of 2D images and videos[C]//Proceedings of the Second International Conference on Image Processing and Capsule Networks. Berlin: Springer, 2022: 254-268.
  • 加载中
图(7) / 表(1)
计量
  • 文章访问数:  301
  • HTML全文浏览量:  103
  • PDF下载量:  26
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-01-30
  • 录用日期:  2024-05-14
  • 网络出版日期:  2024-06-14
  • 整期出版日期:  2026-04-30

目录

    /

    返回文章
    返回
    常见问答