北京航空航天大学学报 ›› 2017, Vol. 43 ›› Issue (6): 1239-1246.doi: 10.13700/j.bh.1001-5965.2016.0462

• 论文 • 上一篇    下一篇

基于正则化秩k矩阵逼近的稀疏主成分分析

杨茜, 刘红英   

  1. 北京航空航天大学 数学与系统科学学院, 北京 100083
  • 收稿日期:2016-05-31 出版日期:2017-06-20 发布日期:2016-10-10
  • 通讯作者: 刘红英,E-mail:liuhongying@buaa.edu.cn E-mail:liuhongying@buaa.edu.cn
  • 作者简介:杨茜 女,硕士研究生。主要研究方向:数值最优化及其应用;刘红英 女,博士,副教授,硕士生导师。主要研究方向:数值最优化及其应用。
  • 基金资助:
    国家自然科学基金(61172060,61403011)

Sparse principal component analysis via regularized rank-k matrix approximation

YANG Qian, LIU Hongying   

  1. School of Mathematics and Systems Science, Beijing University of Aeronautics and Astronautics, Beijing 100083, China
  • Received:2016-05-31 Online:2017-06-20 Published:2016-10-10
  • Supported by:
    National Natural Science Foundation of China (61172060, 61403011)

摘要: 在计算稀疏主成分(PCs)时,由于同时求k个主成分的做法可以减少计算所产生的累积误差,因此提出了基于正则化秩k矩阵逼近的稀疏主成分模型,并设计了求解该模型的块坐标下降法(BCD-sPCA-rSVD)。该算法的主要思想是先把变量按坐标分成2k个块,当固定其他2k-1个坐标块的变量时,求解关于单个坐标块的子问题并给出子问题的显式解,循环地求解这些子问题直至满足终止条件。该算法每次迭代的计算复杂度关于样本个数与变量维数都是线性的,并且证明了它是收敛的。该算法不仅易于实现,数值仿真结果表明,该算法应用到真实数据与合成数据上都是可行且有效的。它不仅使累积误差降低,而且具有较低的计算复杂度,因而可以有效地求解大规模稀疏主成分分析问题。

关键词: 降维, 稀疏主成分, 正则化, 块坐标下降法, 奇异值分解, 阈值

Abstract: In calculating the sparse principal components (PCs), attaining k PCs simultaneously can reduce the accumulated error arising from the calculation process. We proposed the sparse principal component model via regularized rank-k matrix approximation and designed a block coordinate descent method (BCD-sPCA-rSVD) to solve this problem. Its main idea is to first divide variables into 2k blocks by coordinates, and then solve sub-problem with respect to each single coordinate block when keeping other 2k-1 variables fixed. By solving these sub-problems with explicit solutions recursively until the stopping criterion is satisfied, the BCD-sPCA-rSVD algorithm can be easily constructed. Its per-iteration complexity is linear in both sample size and variable dimensionality. The algorithm is convergent and easy to implement. Numerical simulation results show that the algorithm is feasible and effective when applied to real and synthetic data sets. The proposed method reduces the accumulated error and has lower computational complexity, which makes it well suited to handling large-scale problems of sparse principal component analysis.

Key words: dimension reduction, sparse principal component, regularization, block coordinate descent method, singular value decomposition, threshold

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发