Volume 45 Issue 10
Oct.  2019
Turn off MathJax
Article Contents
LIU Ruiping, WANG Huiwen, WANG Shanshanet al. Supervised clustering of variables based on Gram-Schmidt transformation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 2003-2010. doi: 10.13700/j.bh.1001-5965.2019.0050(in Chinese)
Citation: LIU Ruiping, WANG Huiwen, WANG Shanshanet al. Supervised clustering of variables based on Gram-Schmidt transformation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 2003-2010. doi: 10.13700/j.bh.1001-5965.2019.0050(in Chinese)

Supervised clustering of variables based on Gram-Schmidt transformation

doi: 10.13700/j.bh.1001-5965.2019.0050
Funds:

National Natural Science Foundation of China 71420107025

National Natural Science Foundation of China 11701023

More Information
  • Corresponding author: WANG Shanshan, E-mail: sswang@buaa.edu.cn
  • Received Date: 16 Feb 2019
  • Accepted Date: 15 Mar 2019
  • Publish Date: 20 Oct 2019
  • In order to study the dimension reduction method of high-dimensional data based on regression model further, and the supervised clustering of variables algorithm based on Gram-Schmidt transformation (SCV-GS) is proposed. SCV-GS uses the key variables selected in turn by the variable screening idea as the clustering center, which is different from the hierarchical variable clustering around latent variables. High correlation among variables is processed based on Gram-Schmidt transformation and the clustering results are obtained. At the same time, combined with the concept of partial least squares, a new criterion for "homogeneity" is proposed to select the optimal clustering parameters. SCV-GS can not only get the variable clustering results quickly, but also identify the most relevant variable groups and in what kind of structure the variables work to influence the response variable. Simulation results show that the calculation speed is significantly improved by SCV-GS, and the estimated regression coefficients corresponding to the latent variables are consistent with the comparison method. Real data analysis shows that SCV-GS performs better in interpretation and prediction.

     

  • loading
  • [1]
    TIBSHIRANI R.Regression shrinkage and selection via the lasso:A retrospective[J].Journal of the Royal Statistical Society:Series B(Statistica Methodology), 2011, 73(3):273-282. doi: 10.1111/j.1467-9868.2011.00771.x
    [2]
    ZOU H, HASTIE T.Regularization and variable selection via the elastic net[J].Journal of the Royal Statistical Society:Series B(Statistical Methodology), 2005, 67(2):301-320. doi: 10.1111/j.1467-9868.2005.00503.x
    [3]
    FAN J Q, LV J C.Sure independence screening for ultrahigh dimensional feature space[J].Journal of the Royal Statistical Society:Series B(Statistical Methodology), 2008, 70(5):849-911. doi: 10.1111/j.1467-9868.2008.00674.x
    [4]
    WANG H S.Forward regression for ultra-high dimensional variable screening[J].Journal of the American Statistical Association, 2009, 104(488):1512-1524. doi: 10.1198/jasa.2008.tm08516
    [5]
    ZOU H, HASTIE T, TIBSHIRANI R.Sparse principal component analysis[J].Journal of Computational and Graphical Statistics, 2006, 15(2):265-286. doi: 10.1198/106186006X113430
    [6]
    CHUN H, KELEŞ S.Sparse partial least squares regression for simultaneous dimension reduction and variable selection[J].Journal of the Royal Statistical Society:Series B(Statistical Methodology), 2010, 72(1):3-25. doi: 10.1111/j.1467-9868.2009.00723.x
    [7]
    CHEN M K, VIGNEAU E.Supervised clustering of variables[J].Advances in Data Analysis and Classification, 2016, 10(1):85-101. http://d.old.wanfangdata.com.cn/OAPaper/oai_doaj-articles_7d57b83193b227c3e090a9eeb28155dd
    [8]
    JOLLIFFE I T.Discarding variables in a principal component analysis.I:Artificial data[J].Applied Statistics, 1972, 21(2):160-173. doi: 10.2307/2346488
    [9]
    HASTIE T, TIBSHIRANI R, BOTSTEIN D, et al.Supervised harvesting of expression trees[J].Genome Biology, 2001, 2(1):research0003-1. http://d.old.wanfangdata.com.cn/OAPaper/oai_pubmedcentral.nih.gov_17599
    [10]
    VIGNEAU E, QANNARI E.Clustering of variables around latent components[J].Communications in Statistics-Simulation and Computation, 2003, 32(4):1131-1150. doi: 10.1081/SAC-120023882
    [11]
    VIGNEAU E, CHEN M, QANNARI E M.ClustVarLV:An R package for the clustering of variables around latent variables[J].The R Journal, 2015, 7(2):134-148. doi: 10.32614/RJ-2015-026
    [12]
    VIGNEAU E.Segmentation of a panel of consumers with missing data[J].Food Quality and Preference, 2018, 67:10-17. doi: 10.1016/j.foodqual.2017.04.010
    [13]
    CARIOU V, QANNARI E M, RUTLEDGE D N, et al.ComDim:From multiblock data analysis to path modeling[J].Food Quality and Preference, 2018, 67:27-34. doi: 10.1016/j.foodqual.2017.02.012
    [14]
    BJÖRCK Å.Numerics of gram-schmidt orthogonalization[J].Linear Algebra and Its Applications, 1994, 197-198:297-316. doi: 10.1016/0024-3795(94)90493-6
    [15]
    LEON S J, BJÖRCK A, GANDER W.Gram-Schmidt orthogonalization:100 years and more[J].Numerical Linear Algebra with Applications, 2013, 20:492-532. doi: 10.1002/nla.1839
    [16]
    CHEN S, BILLINGS S A, LUO W.Orthogonal least squares methods and their application to non-linear system identification[J].International Journal of Control, 1989, 50(5):1873-1896. doi: 10.1080/00207178908953472
    [17]
    STOPPIGLIA H, DREYFUS G, DUBOIS R, et al.Ranking a random feature for variable and feature selection[J].Journal of Machine Learning Research, 2003, 3:1399-1414. http://cn.bing.com/academic/profile?id=521405dcd1fbe6041d8601c946a9376a&encoded=0&v=paper_preview&mkt=zh-cn
    [18]
    王惠文, 仪彬, 叶明.基于主基底分析的变量筛选[J].北京航空航天大学学报, 2008, 34(11):1288-1291. https://bhxb.buaa.edu.cn/CN/abstract/abstract8983.shtml

    WANG H W, YI B, YE M.Variable selection based on principal basis analysis[J].Journal of Beijing University of Aeronautics and Astronautics, 2008, 34(11):1288-1291(in Chinese). https://bhxb.buaa.edu.cn/CN/abstract/abstract8983.shtml
    [19]
    王惠文, 陈梅玲, SAPORTA G.基于Gram-Schmidt过程的判别变量筛选方法[J].北京航空航天大学学报, 2011, 37(8):958-961. https://bhxb.buaa.edu.cn/CN/abstract/abstract12041.shtml

    WANG H W, CHEN M L, SAPORTA G.Variable selection in discriminant analysis based on Gram-Schmidt process[J].Journal of Beijing University of Aeronautics and Astronautics, 2011, 37(8):958-961(in Chinese). https://bhxb.buaa.edu.cn/CN/abstract/abstract12041.shtml
    [20]
    LIU R P, WANG H W, WANG S S.Functional variable selection via Gram-Schmidt orthogonalization for multiple functional linear regression[J].Journal of Statistical Computation and Simulation, 2018, 88(18):3664-3680. doi: 10.1080/00949655.2018.1530776
    [21]
    FISHER R.On the probable error of a coefficient of correlation deduced from a small sample[J].Metron, 1921, 1(4):3-32. http://cn.bing.com/academic/profile?id=3a072e95cc2650c0bd2f3ff83efe37e2&encoded=0&v=paper_preview&mkt=zh-cn
    [22]
    FRANK L E, FRIEDMAN J H.A statistical view of some chemometrics regression tools[J].Technometrics, 1993, 35(2):109-135. doi: 10.1080/00401706.1993.10485033
    [23]
    MANGOLD W D, BEAN L, ADAMS D.The impact of intercollegiate athletics on graduation rates among major ncaa division I universities:Implications for college persistence theory and practice[J].Journal of Higher Education, 2003, 74(5):540-562. http://cn.bing.com/academic/profile?id=4c72dc2da8185d458e56835a53338f23&encoded=0&v=paper_preview&mkt=zh-cn
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(1)  / Tables(3)

    Article Metrics

    Article views(602) PDF downloads(320) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return