-
摘要:
智能卫星技术对卫星时间序列数据挖掘提出了越来越多的需求。通常卫星数据计算量都非常大,若串行执行则需要较长时间。以卫星异变过程多类型特征分析过程为典型代表,针对窗口划分与向量相似度计算、特征提取、傅里叶变换、聚类等常见数据挖掘操作,探讨了在多核CPU和GPU的典型异构计算节点中对时序数据挖掘过程进行并行优化的多种策略,包括向量化方法、多进程方法、GPU计算等方法。对这几种优化策略的适用情况进行了实验分析对比。结果表明,针对不同任务情况综合使用多种优化策略具有显著提升效果。
Abstract:Intelligent satellite technology requires more and more data mining operations for satellite time series data. Usually, satellite data amount is very big that needs a lot of computation, so it will take a very long time to complete the computation in serial program. The satellite anomaly process multi-features analysis procedure is such a typical representation, which performs many common data mining operations, including windows segmentation, computation of vector similarity, feature extraction, Fourier transformation, and cluster-ing. The paper discusses several speed-up and parallel optimization strategies for a time series data mining procedure on a typical heterogeneous computing node with multi-cores CPUs and GPUs, including vector optimization, multi-process parallelization, and GPU computation. We test and compare these optimization strategies in different usage conditions. The experiment results show that the combined use of them can achieve obvious efficiency improvement for different task.
-
Key words:
- aerospace big data /
- data mining /
- intelligent satellite /
- parallelization /
- GPU
-
表 1 自适应获取周期算法串行代码优化前后耗时对比
Table 1. Comparison of adaptive cycle achieving algorithm's time consumption before and after serial optimization
数据大小 优化前耗时/s 优化后耗时/s 加速比 221 280 25.150 8 1.625 6 15.5 490 440 127.252 5 2.181 4 58.3 1 028 760 562.171 5 2.592 3 216.9 2 105 400 2 306.469 6 3.755 2 614.2 6 094 920 19 213.119 3 6.079 7 3 160.2 12 238 320 78 204.042 6 16.800 1 4 655.0 表 2 不同方法优化前后耗时结果对比
Table 2. Comparison of time consuming results before and after different optimization methods
偏移量δ 耗时/s 无优化 串行优化 并行最优 10 95.285 1 33.591 8 18.403 6 30 58.576 9 11.697 2 10.048 8 90 19.269 7 4.038 3 5.716 7 180 9.632 6 1.975 2 4.020 8 -
[1] CASELLA G, FIENBERG S, OLKIN I.Time series analysis and its applications:With R examples[J].Publications of the American Statistical Association, 2006, 97(458):656-657. [2] LI H, YANG L, GUO C.Improved piecewise vector quantized approximation based on normalized time subsequences[J].Measurement, 2013, 46(9):3429-3439. doi: 10.1016/j.measurement.2013.05.012 [3] WANG J, LI H, HUANG J, et al.Association rules mining based analysis of consequential alarm sequences in chemical processes[J].Journal of Loss Prevention in the Process Industries, 2016, 41:178-185. doi: 10.1016/j.jlp.2016.03.022 [4] LI H.Distance measure with improved lower bound for multivariate time series[J].Physica A:Statistical Mechanics and Its Applications, 2017, 468:622-637. doi: 10.1016/j.physa.2016.10.062 [5] MATTIOLI G, ANABLE J, VROTSOU K.Car dependent practices:Findings from a sequence pattern mining study of UK time use data[J].Transportation Research Part A:Policy and Practice, 2016, 89:56-72. doi: 10.1016/j.tra.2016.04.010 [6] DENG W, WANG G, XU J.Piecewise two-dimensional normal cloud representation for time-series data mining[J].Information Sciences, 2016, 374(C):32-50. [7] GUAN X, SUN G, YI X, et al.A novel data association algorithm for unequal length fluctuant sequence[J].Procedia Engineering, 2015, 99:1190-1202. doi: 10.1016/j.proeng.2014.12.648 [8] SUN Z Y, TSAI M C, TSAI H P.Mining uncertain sequence data on hadoop platform[C]//Pacific-Asia Conference on Know-ledge Discovery and Data Mining.Berlin: Springer, 2014: 204-215. [9] LAM H T, MORCHEN F, FRADKIN D, et al.Mining compressing sequential patterns[J].Statistical Analysis and Data Mining, 2014, 7(1):34-52. doi: 10.1002/sam.11192 [10] GONG X Y, FONG S, WONG R K, et al.Discovering sub-pa-tterns from time series using a normalized cross-match algorithm[J].The Journal of Supercomputing, 2016, 72(10):1-18. [11] JEYABHARATHI J, SHANTHI D.An efficient mining for app-roximate frequent items in protein sequence database[J].Journal of Emerging Technologies in Web Intelligence, 2014, 6(3):324-330. [12] 巨涛, 朱正东, 董小社.异构众核系统及其编程模型与性能优化技术研究综述[J].电子学报, 2015, 43(1):111-119. doi: 10.3969/j.issn.0372-2112.2015.01.018JU T, ZHU Z D, DONG X S.The feature, programming model and performance optimization strategy of heterogeneous many-core systems:A review[J].Acta Electronica Sinica, 2015, 43(1):111-119(in Chinese). doi: 10.3969/j.issn.0372-2112.2015.01.018 [13] 戴春娥, 陈维斌, 傅顺开, 等.通过GPU加速数据挖掘的研究进展和实践[J].计算机工程与应用, 2015, 51(16):109-116. doi: 10.3778/j.issn.1002-8331.1411-0027DAI C E, CHEN W B, FU S K, et al.Research progress and practice of accelerating data mining based on GPU[J].Computer Engineering and Applications, 2015, 51(16):109-116(in Chinese). doi: 10.3778/j.issn.1002-8331.1411-0027 [14] CAVUOTI S, GAROFALO M, BRESCIA M, et al.Astrophysical data mining with GPU.A case study:Genetic classification of globular clusters[J].New Astronomy, 2014, 26(1):12-22. [15] 顾文恺.基于GPU的脉冲压缩并行化研究[J].航空计算技术, 2017, 47(2):121-124. doi: 10.3969/j.issn.1671-654X.2017.02.030GU W K.Study on parallel pulse compression based on GPU[J].Aeronautical Computing Technology, 2017, 47(2):121-124(in Chinese). doi: 10.3969/j.issn.1671-654X.2017.02.030 [16] SCHALKWIJK J, JONKER H J J, SIEBESMA A P, et al.Weather forecasting using GPU-based large-eddy simulations[J].Bulletin of the American Meteorological Society, 2015, 96(5):715-723. doi: 10.1175/BAMS-D-14-00114.1 [17] VACONDIO R, MIGNOSA P, PAGANI S.3D SPH numerical simulation of the wave generated by the vajont rockslide[J].Advances in Water Resources, 2013, 59(11):146-156. [18] 黄曦, 陈伟, 张建奇.基于GPU的实时红外场景仿真系统研究[J].航空兵器, 2015(6):49-54. doi: 10.3969/j.issn.1673-5048.2015.06.012HUANG X, CHEN W, ZHANG J Q.Study on real-time infrared scene simulation system based on GPU[J].Aviation Weapon, 2015(6):49-54(in Chinese). doi: 10.3969/j.issn.1673-5048.2015.06.012 [19] SU X, WANG X, JING G, et al.GPU-Meta-Storms:Computing the structure similarities among massive amount of microbial community samples using GPU[J].Bioinformatics, 2014, 30(7):1031-1033. doi: 10.1093/bioinformatics/btt736 [20] 刘志文.并行算法设计与性能优化[M].北京:机械工业出版社, 2015:162.LIU Z W.Parallel computing and performance optimization[M].Beijing:China Machine Press, 2015:162(in Chinese).