基于局域性约束线性编码的人体动作识别

白琛; 孙军华

doi:10.13700/j.bh.1001-5965.2014.0414

基于局域性约束线性编码的人体动作识别

doi: 10.13700/j.bh.1001-5965.2014.0414

白琛,
孙军华^,

北京航空航天大学仪器科学与光电工程学院, 北京 100191

详细信息

作者简介:
白琛(1990—),男,天津人,硕士研究生,chenbai@aspe.buaa.edu.cn

通讯作者:
孙军华(1975—),男,湖北荆门人,教授,sjh@buaa.edu.cn,主要研究方向为视觉测量、图像分析与识别.

中图分类号: TP391.4
计量
- 文章访问数: 1073
- HTML全文浏览量: 55
- PDF下载量: 556
- 被引次数: 0
出版历程
- 收稿日期: 2014-07-10
- 网络出版日期: 2015-06-20

Human action recognition based on locality-constrained linear coding

BAI Chen,
SUN Junhua^,

School of Instrumentation Science and Opto-electronics Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100191, China

摘要

摘要: 针对动作特征类内差异较大,导致动作分类识别率较低的问题,以及当前算法在计算复杂度和扩展可识别动作类别方面的不足,提出一种基于局域性约束线性编码(LLC)的人体动作识别方法.算法将人体关节的位置、速度和加速度作为局部动作特征;采用局域性约束线性编码对局部动作特征求解稀疏表达,从而减小特征的类内差异,增强区别力;由于编码方法具有解析解,方法处理视频速度可达760帧/s;词典由K均值法分别对每类数据学习得到的子词典组成,使算法在扩展可识别动作类别时无需全局优化.此外,为避免了词典较大情况下分类器的过拟合现象,利用词典元素类别对编码系数进行降维.在使用深度摄像机获得的MSR-Action3D数据库上对所提出的方法进行验证,取得了85.7%的识别率.
- 动作识别 /
- 局域性约束线性编码 /
- 词典学习 /
- 时间金字塔匹配 /
- 深度图像
Abstract: Large intra-class variations of action features lead to low classification accuracy of action recognition, on the other hand, current algorithms exist drawbacks in computational complexity and extension of recognizable action classes. A method based on locality-constrained linear coding (LLC) for action recognition from depth images was proposed. In order to reduce the intra-class variations and increase classification accuracy, joints' positions, velocities and acceleration features were concatenated to form local action features, then LLC was used to calculate sparse representations of local action features. Analytical solution of LLC ensures computational speed of our method is up to 760 frames per second. Dictionary is composed by sub-dictionaries learned by K-means from features of each class separately, so global optimization is avoided during extending recognizable action classes. Moreover, to avoid classifier to be over-fitting, a dimensionality reduction method based on labels of dictionary items was proposed. The proposed method was evaluated on MSR-Action3D dataset captured by depth cameras. The experimental results show that the proposed approach achieves classification accuracy of 85.7%.
- action recognition /
- locality-constrained linear coding /
- dictionary learning /
- temporal pyramid matching /
- depth images

HTML全文

参考文献(22)

[1]	郑韡, 沈旭昆.基于连续数据流的动态手势识别算法[J].北京航空航天大学学报, 2012, 38(2):273-279. Zheng W, Shen X K.Algorithm based on continuous data stream for dynamic gesture recognition[J].Journal of Beijing University of Aeronautics and Astronautics, 2012, 38(2):273-279(in Chinese).
[2]	史骏, 陈才扣.基于马氏距离的半监督鉴别分析及人脸识别[J].北京航空航天大学学报, 2011, 37(12):1589-1593. Shi J, Chen C K.Mahalanobis distance-based semi-supervised discriminant analysis for face recognition[J].Journal of Beijing University of Aeronautics and Astronautics, 2011, 37(12):1589-1593(in Chinese).
[3]	Weinland D, Ronfard R, Boyer E.A survey of vision-based methods for action representation, segmentation and recognition[J].Computer Vision and Image Understanding, 2011, 115(2):224-241.
[4]	Shotton J, Girshick R, Fitzgibbon A, et al.Efficient human pose estimation from single depth images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12):2821-2840.
[5]	Jhuang H, Gall J, Zuffi S, et al.Towards understanding action recognition[C]//Proceedings of IEEE International Conference on Computer Vision (ICCV).Piscataway, NJ:IEEE Press, 2013:3192-3199.
[6]	Xia L, Chen C C, Aggarwal J K.View invariant human action recognition using histograms of 3d joints[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).Piscataway, NJ:IEEE Press, 2012:20-27.
[7]	Yang X, Tian Y L.Eigenjoints-based action recognition using naïve bayes nearest neighbor[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).Piscataway, NJ:IEEE Press, 2012:14-19.
[8]	Wang J, Liu Z, Wu Y, et al.Mining actionlet ensemble for action recognition with depth cameras[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Piscataway, NJ:IEEE Press, 2012:1290-1297.
[9]	Zanfir M, Leordeanu M, Sminchisescu C.The moving pose:An efficient 3D kinematics descriptor for low-latency action recognition and detection[C]//Proceedings of IEEE International Conference on Computer Vision(ICCV).Piscataway, NJ:IEEE Press, 2013:2752-2759.
[10]	Luo J, Wang W, Qi H.Group sparsity and geometry constrained dictionary learning for action recognition from depth maps[C]//Proceedings of IEEE International Conference on Computer Vision(ICCV).Piscataway, NJ:IEEE Press, 2013:1089-1816.
[11]	Wang J, Yang J, Yu K, et al.Locality-constrained linear coding for image classification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Piscataway, NJ:IEEE Press, 2010:3360-3367.
[12]	Yu K, Zhang T, Gong Y.Nonlinear learning using local coordinate coding[C]//Advances in Neural Information Processing Systems.La Jolla, CA:Neural Information Processing Systems Foundation, 2009:1-9.
[13]	Chang C C, Lin C J.LIBSVM:A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology(TIST), 2011, 2(3):27.
[14]	Martens J, Sutskever I.Learning recurrent neural networks with Hessian-free optimization[C]//Proceedings of the 28th International Conference on Machine Learning(ICML).New York:International Machine Learning Society(IMLS), 2011:1033-1040.
[15]	Müller M, Röder T.Motion templates for automatic classification and retrieval of motion capture data[C]//Proceedings of the ACM SIGGRAPH.New York:ACM, 2006:137-146.
[16]	Lv F, Nevatia R.Recognition and segmentation of 3-d human action using hmm and multi-class adaboost[C]//Proceedings of European Conference on Computer Vision(ECCV).Berlin, Heidelberg:Springer, 2006:359-372.
[17]	Morency L, Quattoni A, Darrell T.Latent-dynamic discriminative models for continuous gesture recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ:IEEE Press, 2007:1-8.
[18]	Li W, Zhang Z, Liu Z.Action recognition based on a bag of 3d points[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).Piscataway, NJ:IEEE Press, 2010:9-14.
[19]	Vieira A W, Nascimento E R, Oliveira G L, et al.Stop:space-time occupancy patterns for 3d action recognition from depth map sequences[C]//Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications.Berlin, Heidelberg:Springer, 2012:252-259.
[20]	Wang J, Liu Z, Chorowski J, et al.Robust 3d action recognition with random occupancy patterns[C]//Proceedings of European Conference on Computer Vision(ECCV).Berlin, Heidelberg:Springer, 2012:872-885.
[21]	Mairal J, Bach F, Ponce J, et al.Online dictionary learning for sparse coding[C]//Proceedings of the 26th Annual International Conference on Machine Learning.New York:ACM, 2009:689-696.
[22]	Lee H, Battle A, Raina R, et al.Efficient sparse coding algorithms[C]//Advances in Neural Information Processing Systems.La Jolla, CA:Neural Information Processing Systems Foundation, 2006:801-808.