Incremental algorithm of multiple linear regression model
-
摘要: 伴随着各领域信息化的发展,数据多呈现出快速、连续流入的特点.面向海量不断更新的数据集,在具有广泛使用价值的线性回归模型中,考虑引入增量算法.通过基于叉积矩阵的增量计算公式,得到最小二乘估计模型的增量算法,并进一步扩展到其他的模型估计量和检验统计量中.该增量算法运用了全部的数据信息,与使用全部数据建模具有完全相同的结果.算法节约了数据读取时间,减小了数据存储传输的压力,从而提高了计算效率.数据仿真实验验证了算法的有效性.Abstract: With the development of computer-related technology, people can continuously obtain data faster and faster. Facing with the massive and continuously updated data sets, incremental algorithm was introduced to the popular multiple linear regression analysis. The incremental algorithm of least squares estimation model was derived based on incremental expression of cross product matrix. And further this algorithm was extended to other estimation models and test statistics. The incremental algorithm uses the information of all dataset, which can get the same results with non-incremental methods. This algorithm can save the time in reading and writing data, release the impression on transportation, and thus speed up the computation. Simulation results show that, this algorithm can improve computational efficiency and is very useful in many conditions.
-
Key words:
- linear regression model /
- incremental algorithm /
- cross product matrix /
- estimation /
- test
-
[1] Tomczak J M,Gonczarek A.Decision rules extraction from data stream in the presence of changing context for diabetes treatment[J].Knowledge and Information Systems,2013,34(3):521-546 [2] Yang L,Cao J N,Tang S J,et al.A framework for partitioning and execution of data stream applications in mobile cloud computing[C]//IEEE Fifth International Conference on Cloud Computing.Washington,DC:IEEE Computer Society,2012:794-802 [3] Coppock H W,Freund J E.All-or-none versus incremental learning of errorless shock escapes by the rat[J].Science,1962,135(3500):318-319 [4] Syed N A,Liu H,Sung K K.Handling concept drifts in incremental learning with support vector machines[C]//Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.San Diego,CA:ACM,1999:371-321 [5] Gaber M M,Zaslavsky A,Krishnaswamy S.Mining data streams:a review[J].ACM Sigmod Record,2005,34(2):18-26 [6] Domingos P,Hulten G.A general method for scaling up machinelearning algorithms and its application to clustering[C]//Proceedings of the Eighteenth International Conference on Machine Learning(ICML 2001).Williams College,Williamstown,MA:Morgan Kaufmann,2001:106-113 [7] Babcock B,Datar M,Motwani R.Load shedding techniques for data stream systems[C]//The 2003 Workshop on Management and Processing of Data Streams.San Diego,CA:ACM,2003 [8] Papapetrou O,Garofalakis M,Deligiannakis A.Sketch-based querying of distributed sliding-window data streams[J].Proceedings of the VLDB Endowment,2012,5(10):992-1003 [9] GAMA J.Data stream mining:the bounded rationality[J].Informatica,2013,37(1):21-25 [10] Nath S,Venkatesan R.Publicly verifiable grouped aggregation queries on outsourced data streams[C]//Data Engineering(ICDE),2013 IEEE 29th International Conference on.Washington,DC:IEEE Computer Society,2013:517-528 [11] Muthukrishnan S.Data streams:algorithms and applications[M].Hanover,MA:Now Publishers Inc,2005 [12] Krishnaswamy S,Gama J,Gaber M M.Mobile data stream mining:from algorithms to applications[C]//Mobile Data Management(MDM),2012 IEEE 13th International Conference on.Washington,DC:IEEE Computer Society,2012:360-363 [13] 肖智,王明恺,谢林林.基于支持向量机的大学生助学贷款个人信用评价[J].清华大学学报:自然科学版,2006,46(S1):1120-1124 Xiao Zhi,Wang Mingkai,Xie Linlin.Personal credit evaluation of college student loans with support vector machines[J].Journal of Tsinghua University:Science and Technology,2006,46(S1):1120-1124(in Chinese) [14] Babcock B,Babu S,Datar M,et al.Models and issues in data stream systems[C]//Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.Madison,WI:Association for Computing Machinery,2002:1-16 [15] Golab L,Ozsu M T.Data stream management [J].Synthesis Lectures on Data Management,2010,2(1):1-73 [16] 姚远.海量动态数据流分类方法研究[D].大连:大连理工大学,2013 Yao Yuan.The research on massive and dynamic data stream classification method[D].Dalian:Dalian University of Technology,2013(in Chinese)
点击查看大图
计量
- 文章访问数: 1470
- HTML全文浏览量: 89
- PDF下载量: 904
- 被引次数: 0