Linear regression analysis for normal distribution-valued data based on complete information
-
摘要: 针对正态的分布型符号数据,提出一种新的线性回归分析方法.以体现正态的分布型数据的全部原始信息为出发点,给出正态的分布型变量的一阶矩、二阶原点矩、二阶混合原点矩的定义和计算原则.在此基础上,定义针对正态的分布型数据的线性回归模型以及残差信息的平方和,推导最小二乘回归系数.仿真实验证明了该方法所得回归模型在解释能力和预测能力上的有效性以及相对于"中心法"的优越性.给出的正态分布型变量数字特征的定义和计算原则为将其他经典的多元统计方法推广到分布型数据奠定了基础.
-
关键词:
- 正态的分布型符号数据 /
- 数字特征 /
- 全部信息 /
- 线性回归
Abstract: In light of normal distribution-valued symbolic data, a new method for building linear regression model was proposed. To reflect all the original information of the normal distribution-valued data, definition and calculation principle were proposed for first moments, second original moment and second mixed original moment of normal distribution-valued variables. On this basis, linear regression model for normal distribution-valued data and the sum of squares of the residual information were defined and least-squares regression coefficients were derived. Simulation results show that the explanatory power and predictive ability of the regression model derived by the proposed method are effective and outperform the "Centre Method". The definition and calculation principle of numerical characteristics laid the foundation for extending the other classical multivariate statistical method to distribution-valued data. -
[1] Diday E.Introduction à l'approche symbolique en analyse des données[J].RAIRO Recherche Opérationnelle,1989,23(2):193-236 [2] Billard L,Diday E.Regression analysis for interval-valued data //Data Analysis,Classification and Related Methods.Belgium:Springer,2000:369-374 [3] Billard L,Diday E.Symbolic regression analysis //Classification,Clustering,and Data Analysis.Poland:Springer,2002:281-288 [4] Lima N E,de Carvalho F.Centre and range method to fitting a linear regression model on symbolic interval data[J].Computational Statistics and Data Analysis,2008,52:1500-1515 [5] Billard L.Dependencies and variation components of symbolic interval-valued data //Selected Contributions in Data Analysis and Classification.Berlin:Springer,2007:3-12 [6] Verde R,Irpino A.Ordinary least squares for histogram data based on wasserstein distance //Proceedings of COMPSTAT ‘2010.Berlin:Springer-Verlag,2010:581-588 [7] Schweizer B.Distributions are the numbers of the future //Proceedings of the Mathematics of Fuzzy Systems Meeting.Naples:University of Naples,1984:137-149
点击查看大图
计量
- 文章访问数: 1861
- HTML全文浏览量: 200
- PDF下载量: 541
- 被引次数: 0