留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于正则化秩k矩阵逼近的稀疏主成分分析

杨茜 刘红英

杨茜, 刘红英. 基于正则化秩k矩阵逼近的稀疏主成分分析[J]. 北京航空航天大学学报, 2017, 43(6): 1239-1246. doi: 10.13700/j.bh.1001-5965.2016.0462
引用本文: 杨茜, 刘红英. 基于正则化秩k矩阵逼近的稀疏主成分分析[J]. 北京航空航天大学学报, 2017, 43(6): 1239-1246. doi: 10.13700/j.bh.1001-5965.2016.0462
YANG Qian, LIU Hongying. Sparse principal component analysis via regularized rank-k matrix approximation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2017, 43(6): 1239-1246. doi: 10.13700/j.bh.1001-5965.2016.0462(in Chinese)
Citation: YANG Qian, LIU Hongying. Sparse principal component analysis via regularized rank-k matrix approximation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2017, 43(6): 1239-1246. doi: 10.13700/j.bh.1001-5965.2016.0462(in Chinese)

基于正则化秩k矩阵逼近的稀疏主成分分析

doi: 10.13700/j.bh.1001-5965.2016.0462
基金项目: 

国家自然科学基金 61172060

国家自然科学基金 61403011

详细信息
    作者简介:

    杨茜, 女, 硕士研究生。主要研究方向:数值最优化及其应用

    刘红英, 女, 博士, 副教授, 硕士生导师。主要研究方向:数值最优化及其应用

    通讯作者:

    刘红英, E-mail: liuhongying@buaa.edu.cn

  • 中图分类号: O212.4;TP181;O221.2

Sparse principal component analysis via regularized rank-k matrix approximation

Funds: 

National Natural Science Foundation of China 61172060

National Natural Science Foundation of China 61403011

More Information
  • 摘要:

    在计算稀疏主成分(PCs)时,由于同时求k个主成分的做法可以减少计算所产生的累积误差,因此提出了基于正则化秩k矩阵逼近的稀疏主成分模型,并设计了求解该模型的块坐标下降法(BCD-sPCA-rSVD)。该算法的主要思想是先把变量按坐标分成2k个块,当固定其他2k-1个坐标块的变量时,求解关于单个坐标块的子问题并给出子问题的显式解,循环地求解这些子问题直至满足终止条件。该算法每次迭代的计算复杂度关于样本个数与变量维数都是线性的,并且证明了它是收敛的。该算法不仅易于实现,数值仿真结果表明,该算法应用到真实数据与合成数据上都是可行且有效的。它不仅使累积误差降低,而且具有较低的计算复杂度,因而可以有效地求解大规模稀疏主成分分析问题。

     

  • 表  1  各稀疏PCA算法的计算复杂度

    Table  1.   Computational complexity of each sparse PCA algorithm

    算法计算复杂度
    n>p n<p
    BCD-sPCA-rSVD O(Jk+nk2) O(Jk+ pk2)
    SPCA O(p3) O(Jk+ pk2)
    BCD-SPCA O(Jk+nk2) O(Jk+ pk2)
    下载: 导出CSV

    表  2  Pitprop数据:各稀疏PCA算法的性能指标

    Table  2.   Pitprop data: Performance indicators of each sparse PCA algorithm

    算法稀疏度非正交性相关性PEV/%RRE/%
    PCA00086.9436.06
    DSPCA6313.630.5772.4647.71
    Gpower6317.880.5175.0445.89
    SCoTLASS270.320.4478.2449.24
    ALSPCA6300.3073.3245.37
    sPCA-rSVD-soft5314.760.4676.5946.76
    SPCA600.860.4075.8244.48
    BCD-SPCA6320.050.4075.8644.19
    BCD-sPCA-rSVD631.510.2875.1344.18
    下载: 导出CSV

    表  3  合成数据:PCA与各稀疏PCA算法的载荷

    Table  3.   Synthetic data: Loadings of PCA and each sparse PCA algorithm

    变量 PCA BCD-sPCA-rSVD SPCA(BCD-SPCA)
    PC1 PC2 PC1 PC2 PC1 PC2
    X1 -0.116 3 -0.478 4 0 -0.500 0 0 -0.500 0
    X2 -0.116 2 -0.478 4 0 -0.500 0 0 -0.500 0
    X3 -0.116 2 -0.478 4 0 -0.500 0 0 -0.500 0
    X4 -0.116 2 -0.478 3 0 -0.500 0 0 -0.500 0
    X5 0.395 1 -0.145 3 0.500 0 0 0.500 0 0
    X6 0.395 1 -0.145 3 0.500 0 0 0.500 0 0
    X7 0.395 1 -0.145 4 0.500 0 0 0.500 0 0
    X8 0.395 1 -0.145 3 0.500 0 0 0.500 0 0
    X9 0.400 9 0.009 1 0 0 0 0
    X10 0.400 9 0.0091 0 0 0 0
    PEV/% 99.72 80.46 80.46
    下载: 导出CSV

    表  4  Colon Cancer数据:各稀疏PCA算法的性能和效率指标

    Table  4.   Colon Cancer data: Performance and efficiency indicators of each sparse PCA algorithm

    算法 稀疏度 非正
    交性
    相关性 PEV/
    %
    RRE/
    %
    计算
    时间/
    s
    迭代
    次数
    PCA 0 0 0 58.35 64.54 1.98
    SPCA 5 370 22.88 0.49 46.12 65.48 4.47 165
    BCD-
    SPCA
    5 373 29.15 0.54 48.88 69.73 3.65 97
    BCD-sPCA-
    rSVD
    5 376 24.02 0.48 47.24 65.34 2.76 91
    下载: 导出CSV

    表  5  20Newsgroups数据:各稀疏PCA算法的性能和效率指标

    Table  5.   20Newsgroups data: Performance and efficiency indicators of each sparse PCA algorithm

    算法 稀疏度 非正
    交性
    相关性 PEV/
    %
    RRE/
    %
    时间/
    s
    迭代
    次数
    PCA 0 0 0 10.69 94.50 1.08
    SPCA 161 0.07 0.15 8.42 95.76 2.62 192
    BCD-SPCA 161 17.64 0.30 8.71 95.34 3.11 58
    BCD-sPCA-
    rSVD
    166 0.02 0.09 8.58 95.60 2.73 52
    下载: 导出CSV
  • [1] ZOU H, HASTIE T.Sparse principal component analysis[J].Journal of Computational and Graphical Statistics, 2006, 15(2):265-286. doi: 10.1198/106186006X113430
    [2] TRENDAFILOV N T, JOLLIFFE I T.Projected gradient approach to the numerical solution of the SCoTLASS[J].Computational Statistics and Data Analysis, 2006, 50(1):242-253. doi: 10.1016/j.csda.2004.07.017
    [3] D'ASPREMONT A, GHAOUI L E, JORDAN M I, et al.A direct formulation for sparse PCA using semidefinite programming[J].SIAM Review, 2007, 48(3):434-448. http://papers.nips.cc/paper/2628-a-direct-formulation-for-sparse-pca-using-semidefinite-programming.pdf
    [4] SHEN H, HUANG J Z.Sparse principal component analysis via regularized low rank matrix approximation[J].Journal of Multivariate Analysis, 2008, 99(6):1015-1034. doi: 10.1016/j.jmva.2007.06.007
    [5] MOGHADDAM B, WEISS Y, AVIDAN S.Spectral bounds for sparse PCA:Exact and greedy algorithms[C]//Advances in Neural Information Processing Systems.Montreal:Neural Information Processing System Foundation, 2006:915-922.
    [6] D'ASPREMONT A, BACH F R, GHAOUI L E.Optimal solutions for sparse principal component analysis[J].Machine Learning, 2008, 9(7):1269-1294. https://www.researchgate.net/publication/1897313_Optimal_Solutions_for_Sparse_Principal_Component_Analysis
    [7] LUSS R, TEBOULLE M.Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint[J].SIAM Review, 2013, 55(1):65-98. doi: 10.1137/110839072
    [8] LUSS R, TEBOULLE M.Convex approximations to sparse PCA via Lagrangian duality[J].Operations Research Letters, 2011, 39(1):57-61. doi: 10.1016/j.orl.2010.11.005
    [9] SIGG C, BUHMANN J.Expectation-maximization for sparse and nonnegative PCA[C]//Proceedings of the 25th International Conference on Machine Learning.NewYork:ACM, 2008:960-967.
    [10] JOURNEE M, NESTEROV Y, RICHTARIK P, et al.Generalized power method for sparse principal component analysis[J].Journal of Machine Learning Research, 2010, 11(2):517-553. http://www.oalib.com/paper/3867039
    [11] LU Z S, ZHANG Y.An augmented Lagrangian approach for sparse principal component analysis[J].Math Program Series A, 2012, 135(1-2):149-193. doi: 10.1007/s10107-011-0452-4
    [12] ZHAO Q, MENG D Y, XU Z B, et al.A block coordinates descent approach for sparse principal component analysis[J].Neurocomputing, 2015, 153(4):180-190. http://gr.xjtu.edu.cn/c/document_library/get_file?folderId=1834195&name=DLFE-34834.pdf
    [13] TIBSHIRANI R. Regression shrinkage and selection via the LASSO[J].Journal of the Royal Statistical Society Series B, 1996, 58(3):267-268. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.35.7574&rep=rep1&type=pdf
    [14] TSENG P.Convergence of a block coordinates descent method for nondifferentiable minimization[J].Journal of Optimization Theory Apply, 2001, 109(3):475-494. doi: 10.1023/A:1017501703105
    [15] JEFFERS J N R.Two case studies in the application of principal component analysis[J].Applied Statistics, 1967, 16(3):225-236. doi: 10.2307/2985919
    [16] ALON U, BARKAI N, NOTTERMAN D A, et al.Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays[J].Proceedings of the National Academy of Sciences of the United States of America, 1999, 96(12):6745-6750. doi: 10.1073/pnas.96.12.6745
  • 加载中
表(5)
计量
  • 文章访问数:  551
  • HTML全文浏览量:  130
  • PDF下载量:  5
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-05-31
  • 录用日期:  2016-09-02
  • 网络出版日期:  2017-06-20

目录

    /

    返回文章
    返回
    常见问答