Interval data analysis based on empirical distribution function
-
摘要: 现有区间数据分析的方法通常假设数据在某一区间上服从均匀分布,这在实际数据分析中通常是不成立的.针对此问题,在原始数据来源于连续分布的简单假设下,利用经过分布函数变换后的随机变量服从(0,1)上的均匀分布,分别采用经验分布函数和核估计对原始数据的分布函数进行估计.基于此设计变换,对变换后的数据进行均匀分布的假设检验,通过检验后进行后续的区间数据分析,使得均匀分布的假定得以成立,保证了统计理论上的严谨性.数据模拟结果表明,将经验分布函数变换后的数据作为研究对象,进行区间数据分析,所得到的统计建模结果更加合理且具有较强的解释力.Abstract: Uniform distribution in some closed or tight interval is a basic assumption in the literature about interval data analysis, which is difficult to satisfy in real data processing. To solve this problem, the empirical cumulative distribution function (ECDF) and kernel estimation of cumulative distribution were studied, on the assumption that the date were from some continuous distribution. Based on ECDF and kernel estimation, a transformation to obtain new data was designed, which was uniformly distributed in theory. Then whether the distribution of transformed data was uniform distribution was tested. If the null hypothesis was not rejected, traditional methods in the field of interval data analysis could be utilized based on transformed data. The transform and the test were both for guaranteeing the transformed data were from some uniform distribution. Both simulation and real data example show that, the results based on ECDF and kernel estimation transformed data are more reasonable and with strong explanatory ability.
-
Key words:
- interval data /
- uniform distribution /
- kernel estimation /
- empirical distribution /
- hypothesis test
-
[1] Sankararaman S, Mahadevan S.Likelihood-based representation of epistemic uncertainty due to sparse point data and/or interval data[J].Reliability Engineering & System Safety,2011,96(7):814-824. [2] Diday E, Noirhomme-Fraiture M.Symbolic data analysis and the SODAS software[M].London:Wiley Online Library,2008:81-92. [3] Billard L. Symbolic data analysis:what is it [M].New York:Springer,2006:261-268. [4] Diday E, Esposito F.An introduction to symbollic data analysis and the SODAS software[J].Intelligent Data Analysis,2003,7(6): 583-601. [5] Wang H W, Guan R,Wu J J.CIPCA:complete-information-based principal component analysis for interval-valued data[J].Neurocomputing,2012,86:158-169. [6] Wang H W, Guan R,Wu J J.Linear regression of interval-valued data based on complete information in hypercubes[J].Journal of Systems Science and Systems Engineering,2012,21(4):422-442. [7] Yue Z L. A group decision making approach based on aggregating interval data into interval-valued intuitionistic fuzzy information[J].Applied Mathematical Modelling,2014,38(2):683-698. [8] Cerný M, Hladík M.The complexity of computation and approximation of the t-ratio over one-dimensional interval data[J].Computational Statistics and Data Analysis,2014,80:26-43. [9] Yang X J, Yan L L,Peng H,et al.Encoding words into cloud models from interval-valued data via fuzzy statistics and membership function fitting[J].Knowledge-Based Systems,2014,55:114-124. [10] 郭均鹏,陈颖, 李汶华.一般分布区间型符号数据的K均值聚类方法[J].管理科学学报,2013,16(3):21-28. Guo J P,Chen Y,Li W H.K-means clustering of generally distributed interval symbolic data[J].Journal of Management Sciences in China,2013,16(3):21-28(in Chinese). [11] 高飒. 一般分布区间型符号数据的聚类分析方法研究[D].天津:天津大学,2009. Gao S.The clustering analysis of generally distributed interval symbolic data[D].Tianjin:Tianjin University,2009(in Chinese). [12] Silverman B W. Density estimation for statistics and data analysis[M].London:Chapman and Hall,1986:34-48. [13] Fan J Q, Yao Q W.Nonlinear time series: nonparametric and parametric methods[M].New York:Springer Verlag,2003:193-212. [14] Marhuenda Y, Morales D,Pardo M C.Power results of tests for the uniform distribution,I-2005-09[R].Spain:Miguel Hernandez University of Elche,2005. [15] Kolmogorov A N. Sulla determinazione empirica di una legge di distribuzione[J].G Inst Ital Att,1933,4:83-91. [16] Sinclair C D, Spurr B D.Approximations to the distribution function of the anderson:darling test statistic[J].Journal of the American Statistical Association,1988,83(404):1190-1191. [17] Conover W J. Practical nonparametric statistics[M].New York:Wiley,1999:63-70. [18] Zhang J. Powerful goodness-of-fit tests based on the likelihood ratio[J].Journal of the Royal Statistical Society,Series B(Statistical Methodology),2002,64(2):281-294.
点击查看大图
计量
- 文章访问数: 1343
- HTML全文浏览量: 24
- PDF下载量: 1268
- 被引次数: 0