Two-layer clustering over data stream with fault-tolerance
You Yuyang1, Zhu Jihong1, Yang Zhihong2*
1. Department of Computer, Tsinghua University, Beijing 100084, China;
2. Institute of Medicinal Plant Development, CAMS, Beijing 100193, China

Abstract�� A new envolving data stream clustering algorithm with fault-tolerance characteristic was proposed named FTGDStream (fault-tolerant grid-density clustering over data stream). It introduces appropriate relaxation of conditions for discover generalised knowledge in real world data polluted by noise. First, FTGDStream uses similarity measure technology and lifting wavelet to construct synopsis HLSFTS (hierarchical lifting scheme fault-tolerant synopses) to realize online micro-cluster phase. Second, FTGDStream uses grid-density clustering technology to realize offline macro-cluster phase. High compression ratio of HLSFTS in micro-cluster reduces the computation load of grid-density clustering algorithm in macro-cluster and improves the efficiency of two-layer algorithm. Simulation in UCI data set proves that FTGDStream is able to clustering any shape in data space and suitable for dealing with high-dimensional data streams. FTGDStream is an efficient clustering algorithm with fault-tolerance.
Keywords�� evolving data stream clustering   fault-tolerance   synopses   grid density     
Received 2011-02-07;
You Yuyang, Zhu Jihong, Yang Zhihong.Two-layer clustering over data stream with fault-tolerance[J]  JOURNAL OF BEIJING UNIVERSITY OF AERONAUTICS AND A, 2012,V38(5): 665-669,674
