Volume 47 Issue 7
Jul.  2021
Turn off MathJax
Article Contents
ZHAO Qing, WANG Huiwen, WANG Shanshanet al. A principal component analysis of interval data based on center and log-radius[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(7): 1414-1421. doi: 10.13700/j.bh.1001-5965.2020.0227(in Chinese)
Citation: ZHAO Qing, WANG Huiwen, WANG Shanshanet al. A principal component analysis of interval data based on center and log-radius[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(7): 1414-1421. doi: 10.13700/j.bh.1001-5965.2020.0227(in Chinese)

A principal component analysis of interval data based on center and log-radius

doi: 10.13700/j.bh.1001-5965.2020.0227
Funds:

National Natural Science Foundation of China 71420107025

National Natural Science Foundation of China 11701023

More Information
  • Corresponding author: WANG Shanshan. E-mail: sswang@buaa.edu.cn
  • Received Date: 29 May 2020
  • Accepted Date: 22 Aug 2020
  • Publish Date: 20 Jul 2021
  • In order to study the dimension reduction and visualization of multivariate interval data, a two-dimensional array including center and log-radius is used as the expression of interval data. Then the algebraic algorithm of interval data is given, and a new Principal Component Analysis (PCA) method of interval data is proposed on this basis. The processing of the logarithm of interval radius ensures the rationality that the range of the final interval principal components are non-negative. The calculation of this new method is simple, and the complexity is low. Furthermore, the change of the relative position between the points in the sample group before and after the dimension reduction is as small as possible. By reducing the dimension of variables in the high-dimensional space, various classical statistical analysis methods can be used. Besides, the sample points in the original high-dimensional space can be depicted in the low-dimensional space, which makes it possible to visualize multivariate interval data. The results of simulation experiment verify the effectiveness of the proposed method.

     

  • loading
  • [1]
    WOLD S, ESBENSEN K, GELADI, P. Principal component analysis[J]. Chemometrics and Intelligent Laboratory Systems, 1987, 2(1-3): 37-52. doi: 10.1016/0169-7439(87)80084-9
    [2]
    任若恩, 王惠文. 多元统计数据分析: 理论、方法、实例[M]. 北京: 国防工业出版社, 1997: 92-95.

    REN R E, WANG H W. Multivariate statistical data analysis: Theory, method and examples[M]. Beijing: National Defense Industry Press, 1997: 92-95(in Chinese).
    [3]
    SPETSIERIS P G, MA Y, DHAWAN V, et al. Differential diagnosis of parkinsonian syndromes using PCA-based functional imaging features[J]. NeuroImage, 2009, 45(4): 1241-1252. doi: 10.1016/j.neuroimage.2008.12.063
    [4]
    胡艳, 王惠文. 一种海量数据的分析技术——符号数据分析及应用[J]. 北京航空航天大学学报, 2002, 17(2): 40-44. https://www.cnki.com.cn/Article/CJFDTOTAL-BHDS200402009.htm

    HU Y, WANG H W. A new data mining method based on huge data and its application[J]. Journal of Beijing University of Aeronautics and Astronautics, 2002, 17(2): 40-44(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-BHDS200402009.htm
    [5]
    DIDAY E. Thinking by classes in data science: The symbolic data analysis paradigm: Symbolic data analysis[J]. Wiley Interdiplinary Reviews: Computational Statistics, 2016, 8(5): 172-205. doi: 10.1002/wics.1384
    [6]
    张寅, 王岩, 王惠文. 重点学术期刊专项基金管理中的期刊评价——基于简化的区间数据主成分分析方法[J]. 管理科学学报, 2010, 13(7): 92-98. https://www.cnki.com.cn/Article/CJFDTOTAL-JCYJ201007009.htm

    ZHANG Y, WANG Y, WANG H W. Evaluating of academic journals in management of key academic journal fund: An application of simplified principal component analysis based on interval data[J]. Journal of Management Sciences in China, 2010, 13(7): 92-98(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JCYJ201007009.htm
    [7]
    CAZES P, CHOUAKRIA A, DIDAY E, et al. Extension de l'analyse en composantes principales à des donnés de type intervalle[J]. Revue de Statistique Apliquée, 1997(3): 5-24. CAZES P, CHOUAKRIA A, DIDAY E, et al. Extending principal component analysis to interval data[J]. Applied Statistics Review, 1997(3): 5-24(in France). http://www.researchgate.net/publication/256822020_Extensions_de_l'Analyse_en_Composantes_Principales_a_des_donnees_de_type_intervalle
    [8]
    DIDAY E, BOCK H H. Analysis of symbolic data: Exploratory methods for extracting statistical information from complex data[J]. Journal of Classification, 2000, 18(2): 291-294. doi: 10.1007/978-3-642-57155-8_3
    [9]
    王惠文, 李岩, 关蓉. 两种区间数据主成分分析方法的比较研究[J]. 北京航空航天大学学报, 2010, 24(4): 86-89. https://www.cnki.com.cn/Article/CJFDTOTAL-BHDS201104017.htm

    WANG H W, LI Y, GUAN R. A comparison study of two methods for principal component analysis of interval data[J]. Journal of Beijing University of Aeronautics and Astronautics, 2010, 24(4): 86-89(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-BHDS201104017.htm
    [10]
    CHOUAKRIA A, DIDAY E, CAZES P. Vertices principal components analysis with an improved factorial representation[C]//Proceedings of the 6th Conference of the International Federation of Classification Societies (IFCS-98). Berlin: Springer, 1998: 397-402.
    [11]
    LAURO C N, PALUMBO F. Principal components analysis of interval data: A symbolic data analysis approach[J]. Computational Statistics, 2000, 15(1): 73-87. doi: 10.1007/s001800050038
    [12]
    郭均鹏, 李汶华. 基于经验相关矩阵的区间主成分分析[J]. 管理科学学报, 2008, 11(3): 49-52. doi: 10.3321/j.issn:1007-9807.2008.03.005

    GUO J P, LI W H. Interval PCA based on empirical correlation matrix[J]. Journal of Management Sciences in China, 2008, 11(3): 49-52(in Chinese). doi: 10.3321/j.issn:1007-9807.2008.03.005
    [13]
    PALUMBO F, LAURO C N. A PCA for interval-valued data based on midpoints and radii[C]//Proceedings of the International Meeting of the Psychometric Society IMPS2001. Berlin: Springer, 2003: 641-648.
    [14]
    郭均鹏, 李汶华. 基于误差理论的区间主成分分析及其应用[J]. 数理统计与管理, 2007, 26(4): 636-640. doi: 10.3969/j.issn.1002-1566.2007.04.012

    GUO J P, LI W H. Principle component analysis based on error theory and its application[J]. Application of Statistics and Management, 2007, 26(4): 636-640(in Chinese). doi: 10.3969/j.issn.1002-1566.2007.04.012
    [15]
    WANG H H, GUAN R, WU J J. CIPCA: Complete-information-based principal component analysis for interval-valued data[J]. Neurocomputing, 2012, 86(5): 158-169. http://www.sciencedirect.com/science/article/pii/S0925231212001051
    [16]
    侯自盼, 李生刚. 一种针对区间型数据的新主成分分析法[J]. 纺织高校基础科学学报, 2016, 29(2): 184-189. https://www.cnki.com.cn/Article/CJFDTOTAL-FGJK201602007.htm

    HOU Z P, LI S G. A new principal component analysis method for interval data[J]. Basic Sciences Journal of Textile Universities, 2016, 29(2): 184-189(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-FGJK201602007.htm
    [17]
    刘清贤. 区间型符号数据主成分分析及有效性研究[D]. 西安: 西安科技大学, 2019: 19-24.

    LIU Q X. Principal component analysis of interval symbol data and validity study[D]. Xi'an: Xi'an University of Science and Technology, 2019: 19-24(in Chinese).
    [18]
    郭均鹏, 李汶华. 一种区间PCA的效度分析方法[J]. 系统工程学报, 2009, 24(2): 226-230. https://www.cnki.com.cn/Article/CJFDTOTAL-XTGC200902016.htm

    GUO J P, LI W H. Analysis of validity of the PCA for interval data[J]. Journal of Systems Engineering, 2009, 24(2): 226-230(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-XTGC200902016.htm
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Tables(2)

    Article Metrics

    Article views(472) PDF downloads(46) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return