留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于Gram-Schmidt变换的有监督变量聚类

刘瑞平 王惠文 王珊珊

刘瑞平, 王惠文, 王珊珊等 . 基于Gram-Schmidt变换的有监督变量聚类[J]. 北京航空航天大学学报, 2019, 45(10): 2003-2010. doi: 10.13700/j.bh.1001-5965.2019.0050
引用本文: 刘瑞平, 王惠文, 王珊珊等 . 基于Gram-Schmidt变换的有监督变量聚类[J]. 北京航空航天大学学报, 2019, 45(10): 2003-2010. doi: 10.13700/j.bh.1001-5965.2019.0050
LIU Ruiping, WANG Huiwen, WANG Shanshanet al. Supervised clustering of variables based on Gram-Schmidt transformation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 2003-2010. doi: 10.13700/j.bh.1001-5965.2019.0050(in Chinese)
Citation: LIU Ruiping, WANG Huiwen, WANG Shanshanet al. Supervised clustering of variables based on Gram-Schmidt transformation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 2003-2010. doi: 10.13700/j.bh.1001-5965.2019.0050(in Chinese)

基于Gram-Schmidt变换的有监督变量聚类

doi: 10.13700/j.bh.1001-5965.2019.0050
基金项目: 

国家自然科学基金 71420107025

国家自然科学基金 11701023

详细信息
    作者简介:

    刘瑞平   女, 博士研究生。主要研究方向:高维数据的降维方法及应用

    王惠文   女, 博士, 教授, 博士生导师。主要研究方向:经济管理中复杂数据统计分析的理论、方法与应用

    王珊珊   女, 博士, 助理教授, 硕士生导师。主要研究方向:高维复杂数据分析、半参数统计、机器学习、统计算法及应用

    通讯作者:

    王珊珊, E-mail: sswang@buaa.edu.cn

  • 中图分类号: O212.4

Supervised clustering of variables based on Gram-Schmidt transformation

Funds: 

National Natural Science Foundation of China 71420107025

National Natural Science Foundation of China 11701023

More Information
  • 摘要:

    为进一步研究回归模型中高维数据的降维方法,提出基于Gram-Schmidt变换的新的有监督变量聚类(SCV-GS)方法。该方法未采用以潜变量为聚类中心的层次聚类,而是借用变量扫描思想,依次挑出对响应变量有重要贡献的关键变量,并将其作为聚类中心。SCV-GS方法基于Gram-Schmidt变换,对变量之间的高度相关性进行批量处理,并得到聚类结果;同时,结合偏最小二乘思想,提出新的同一性度量,并以此来选取最佳聚合参数。SCV-GS不仅可以快速得到变量聚类结果,而且可识别出对响应变量的解释及预测起关键作用的变量类。仿真表明该聚类方法运算速度显著提升,而且所得潜变量对应的回归系数的估计结果与对照方法表现一致;实例分析表明该方法具有更好的解释性和预测能力。

     

  • 图 1  SCV-LV与SCV-GS方法运行时间对比

    Figure 1.  Comparison of computation time (in seconds) between SCV-LV and SCV-GS

    表  1  SCV-GS方法变量聚类结果

    Table  1.   Variable clustering results by SCV-GS

    变量 G1 G2 G3 G4 G5
    x1 100 0 0 0 0
    x2 100 0 0 0 0
    x3 100 0 0 0 0
    x4 100 0 0 0 0
    x5 100 0 0 0 0
    x6 100 0 0 0 0
    x7 100 0 0 0 0
    x8 100 0 0 0 0
    x9 100 0 0 0 0
    x10 100 0 0 0 0
    x11 100 0 0 0 0
    x12 100 0 0 0 0
    x13 100 0 0 0 0
    x14 100 0 0 0 0
    x15 100 0 0 0 0
    x16 100 0 0 0 0
    x17 99 0 0 0 0
    x18 100 0 0 0 0
    x19 99 0 0 0 0
    x20 100 0 0 0 0
    x21 0 97 2 0 0
    x22 0 100 0 0 0
    x23 0 99 1 0 0
    x24 0 100 0 0 0
    x25 0 99 0 0 0
    x26 0 100 0 0 0
    x27 0 99 1 0 0
    x28 0 98 1 0 0
    x29 0 99 0 0 0
    x30 0 99 1 0 0
    x31 0 99 1 0 0
    x32 0 98 2 0 0
    x33 0 99 1 0 0
    x34 0 100 0 0 0
    x35 0 100 0 0 0
    x36 0 100 0 0 0
    x37 0 100 0 0 0
    x38 0 99 0 0 1
    x39 0 100 0 0 0
    x40 0 100 0 0 0
    x41 0 0 100 0 0
    x42 0 0 100 0 0
    x43 0 0 100 0 0
    x44 0 0 100 0 0
    x45 0 0 100 0 0
    x46 0 1 99 0 0
    x47 0 0 99 0 1
    x48 0 0 100 0 0
    x49 0 0 100 0 0
    x50 0 0 100 0 0
    x51 0 0 0 100 0
    x52 0 0 0 100 0
    x53 0 0 0 100 0
    x54 0 0 0 100 0
    x55 0 0 0 100 0
    x56 0 0 0 100 0
    x57 0 0 0 100 0
    x58 0 0 0 100 0
    x59 0 0 0 100 0
    x60 0 0 0 100 0
    x61 0 0 0 0 100
    x62 0 0 0 0 100
    x63 0 0 0 0 100
    x64 0 0 0 0 100
    x65 0 0 0 0 100
    x66 0 0 0 0 100
    x67 0 0 0 0 100
    x68 0 0 0 0 100
    x69 0 0 0 0 100
    x70 0 0 0 0 100
    x71 0 0 0 0 100
    x72 0 0 0 0 100
    x73 0 0 0 0 100
    x74 0 0 0 0 100
    x75 0 0 0 0 100
    x76 0 0 0 0 100
    x77 0 0 0 0 100
    x78 0 0 0 0 100
    x79 0 0 0 0 100
    x80 0 0 0 0 100
    下载: 导出CSV

    表  2  SCV-LV与SCV-GS方法所得潜变量回归系数的估计结果

    Table  2.   Estimated regression coefficients by SCV-LVand SCV-GS as a function of latent variables

    方法
    SCV-LV -0.07(0.44) -0.08(0.43) 0.03(0.28)
    SCV-GS -0.10 (0.44) -0.10(0.43) 0.02(0.28)
    下载: 导出CSV

    表  3  SCV-LV与SCV-GS方法作用于实例数据集所得结果

    Table  3.   Results on real dataset by SCV-LV and SCV-GS

    方法 K 变量
    SCV-LV 1 0.83 x1~x19
    SCV-GS 16 0.87 x2, x3
    下载: 导出CSV
  • [1] TIBSHIRANI R.Regression shrinkage and selection via the lasso:A retrospective[J].Journal of the Royal Statistical Society:Series B(Statistica Methodology), 2011, 73(3):273-282. doi: 10.1111/j.1467-9868.2011.00771.x
    [2] ZOU H, HASTIE T.Regularization and variable selection via the elastic net[J].Journal of the Royal Statistical Society:Series B(Statistical Methodology), 2005, 67(2):301-320. doi: 10.1111/j.1467-9868.2005.00503.x
    [3] FAN J Q, LV J C.Sure independence screening for ultrahigh dimensional feature space[J].Journal of the Royal Statistical Society:Series B(Statistical Methodology), 2008, 70(5):849-911. doi: 10.1111/j.1467-9868.2008.00674.x
    [4] WANG H S.Forward regression for ultra-high dimensional variable screening[J].Journal of the American Statistical Association, 2009, 104(488):1512-1524. doi: 10.1198/jasa.2008.tm08516
    [5] ZOU H, HASTIE T, TIBSHIRANI R.Sparse principal component analysis[J].Journal of Computational and Graphical Statistics, 2006, 15(2):265-286. doi: 10.1198/106186006X113430
    [6] CHUN H, KELEŞ S.Sparse partial least squares regression for simultaneous dimension reduction and variable selection[J].Journal of the Royal Statistical Society:Series B(Statistical Methodology), 2010, 72(1):3-25. doi: 10.1111/j.1467-9868.2009.00723.x
    [7] CHEN M K, VIGNEAU E.Supervised clustering of variables[J].Advances in Data Analysis and Classification, 2016, 10(1):85-101. http://d.old.wanfangdata.com.cn/OAPaper/oai_doaj-articles_7d57b83193b227c3e090a9eeb28155dd
    [8] JOLLIFFE I T.Discarding variables in a principal component analysis.I:Artificial data[J].Applied Statistics, 1972, 21(2):160-173. doi: 10.2307/2346488
    [9] HASTIE T, TIBSHIRANI R, BOTSTEIN D, et al.Supervised harvesting of expression trees[J].Genome Biology, 2001, 2(1):research0003-1. http://d.old.wanfangdata.com.cn/OAPaper/oai_pubmedcentral.nih.gov_17599
    [10] VIGNEAU E, QANNARI E.Clustering of variables around latent components[J].Communications in Statistics-Simulation and Computation, 2003, 32(4):1131-1150. doi: 10.1081/SAC-120023882
    [11] VIGNEAU E, CHEN M, QANNARI E M.ClustVarLV:An R package for the clustering of variables around latent variables[J].The R Journal, 2015, 7(2):134-148. doi: 10.32614/RJ-2015-026
    [12] VIGNEAU E.Segmentation of a panel of consumers with missing data[J].Food Quality and Preference, 2018, 67:10-17. doi: 10.1016/j.foodqual.2017.04.010
    [13] CARIOU V, QANNARI E M, RUTLEDGE D N, et al.ComDim:From multiblock data analysis to path modeling[J].Food Quality and Preference, 2018, 67:27-34. doi: 10.1016/j.foodqual.2017.02.012
    [14] BJÖRCK Å.Numerics of gram-schmidt orthogonalization[J].Linear Algebra and Its Applications, 1994, 197-198:297-316. doi: 10.1016/0024-3795(94)90493-6
    [15] LEON S J, BJÖRCK A, GANDER W.Gram-Schmidt orthogonalization:100 years and more[J].Numerical Linear Algebra with Applications, 2013, 20:492-532. doi: 10.1002/nla.1839
    [16] CHEN S, BILLINGS S A, LUO W.Orthogonal least squares methods and their application to non-linear system identification[J].International Journal of Control, 1989, 50(5):1873-1896. doi: 10.1080/00207178908953472
    [17] STOPPIGLIA H, DREYFUS G, DUBOIS R, et al.Ranking a random feature for variable and feature selection[J].Journal of Machine Learning Research, 2003, 3:1399-1414. http://cn.bing.com/academic/profile?id=521405dcd1fbe6041d8601c946a9376a&encoded=0&v=paper_preview&mkt=zh-cn
    [18] 王惠文, 仪彬, 叶明.基于主基底分析的变量筛选[J].北京航空航天大学学报, 2008, 34(11):1288-1291. https://bhxb.buaa.edu.cn/CN/abstract/abstract8983.shtml

    WANG H W, YI B, YE M.Variable selection based on principal basis analysis[J].Journal of Beijing University of Aeronautics and Astronautics, 2008, 34(11):1288-1291(in Chinese). https://bhxb.buaa.edu.cn/CN/abstract/abstract8983.shtml
    [19] 王惠文, 陈梅玲, SAPORTA G.基于Gram-Schmidt过程的判别变量筛选方法[J].北京航空航天大学学报, 2011, 37(8):958-961. https://bhxb.buaa.edu.cn/CN/abstract/abstract12041.shtml

    WANG H W, CHEN M L, SAPORTA G.Variable selection in discriminant analysis based on Gram-Schmidt process[J].Journal of Beijing University of Aeronautics and Astronautics, 2011, 37(8):958-961(in Chinese). https://bhxb.buaa.edu.cn/CN/abstract/abstract12041.shtml
    [20] LIU R P, WANG H W, WANG S S.Functional variable selection via Gram-Schmidt orthogonalization for multiple functional linear regression[J].Journal of Statistical Computation and Simulation, 2018, 88(18):3664-3680. doi: 10.1080/00949655.2018.1530776
    [21] FISHER R.On the probable error of a coefficient of correlation deduced from a small sample[J].Metron, 1921, 1(4):3-32. http://cn.bing.com/academic/profile?id=3a072e95cc2650c0bd2f3ff83efe37e2&encoded=0&v=paper_preview&mkt=zh-cn
    [22] FRANK L E, FRIEDMAN J H.A statistical view of some chemometrics regression tools[J].Technometrics, 1993, 35(2):109-135. doi: 10.1080/00401706.1993.10485033
    [23] MANGOLD W D, BEAN L, ADAMS D.The impact of intercollegiate athletics on graduation rates among major ncaa division I universities:Implications for college persistence theory and practice[J].Journal of Higher Education, 2003, 74(5):540-562. http://cn.bing.com/academic/profile?id=4c72dc2da8185d458e56835a53338f23&encoded=0&v=paper_preview&mkt=zh-cn
  • 加载中
图(1) / 表(3)
计量
  • 文章访问数:  334
  • HTML全文浏览量:  2
  • PDF下载量:  313
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-02-16
  • 录用日期:  2019-03-15
  • 刊出日期:  2019-10-20

目录

    /

    返回文章
    返回
    常见问答