北京航空航天大学学报 ›› 2021, Vol. 47 ›› Issue (7): 1414-1421.doi: 10.13700/j.bh.1001-5965.2020.0227

• 论文 • 上一篇    下一篇

基于中心-对数半长的区间数据主成分分析

赵青1,2, 王惠文1,3, 王珊珊1,2   

  1. 1. 北京航空航天大学 经济管理学院, 北京 100083;
    2. 城市运行应急保障模拟技术北京市重点实验室, 北京 100083;
    3. 北京航空航天大学 大数据科学与脑机智能高精尖创新中心, 北京 100083
  • 收稿日期:2020-05-29 发布日期:2021-08-06
  • 通讯作者: 王珊珊 E-mail:sswang@buaa.edu.cn
  • 基金资助:
    国家自然科学基金(71420107025,11701023)

A principal component analysis of interval data based on center and log-radius

ZHAO Qing1,2, WANG Huiwen1,3, WANG Shanshan1,2   

  1. 1. School of Economics and Management, Beihang University, Beijing 100083, China;
    2. Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operations, Beijing 100083, China;
    3. Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100083, China
  • Received:2020-05-29 Published:2021-08-06
  • Supported by:
    National Natural Science Foundation of China (71420107025,11701023)

摘要: 为研究多变量区间数据的降维和可视化,采用包含中心点和半长对数值的二维数组表征区间数据,建立了区间数据的代数运算法则,并在此基础上提出了一种新的区间数据主成分分析(PCA)方法。对区间半长取对数的处理保证了最终得到的区间主成分半长非负的合理性,计算过程简单、复杂度较低,并且使得降维前后样本集合中点点之间相对位置的改变尽可能小。通过对高维空间进行变量降维,从而多种经典的统计分析方法能够得到运用,同时能够在低维空间中描绘原始高维空间中的样本点,使得多变量区间数据的可视化成为可能。仿真实验结果表明了所提方法的有效性。

关键词: 区间数据, 主成分分析(PCA), 中心-对数半长, 降维, 协方差矩阵

Abstract: In order to study the dimension reduction and visualization of multivariate interval data, a two-dimensional array including center and log-radius is used as the expression of interval data. Then the algebraic algorithm of interval data is given, and a new Principal Component Analysis (PCA) method of interval data is proposed on this basis. The processing of the logarithm of interval radius ensures the rationality that the range of the final interval principal components are non-negative. The calculation of this new method is simple, and the complexity is low. Furthermore, the change of the relative position between the points in the sample group before and after the dimension reduction is as small as possible. By reducing the dimension of variables in the high-dimensional space, various classical statistical analysis methods can be used. Besides, the sample points in the original high-dimensional space can be depicted in the low-dimensional space, which makes it possible to visualize multivariate interval data. The results of simulation experiment verify the effectiveness of the proposed method.

Key words: interval data, Principal Component Analysis (PCA), center and log-radius, dimension reduction, covariance matrix

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发