Clustering data streams using grid-based synopsis

被引:0
|
作者
Vasudha Bhatnagar
Sharanjit Kaur
Sharma Chakravarthy
机构
[1] University of Delhi,Department of Computer Science
[2] University of Delhi,Department of Computer Science, Acharya Narendra Dev College
[3] The University of Texas,Computer Science and Engineering Department
来源
关键词
Stream clustering; Synopsis; Micro-cluster; Grid structure; Exclusive clustering; Complete clustering;
D O I
暂无
中图分类号
学科分类号
摘要
Continually advancing technology has made it feasible to capture data online for onward transmission as a steady flow of newly generated data points, termed as data stream. Continuity and unboundedness of data streams make storage of data and multiple scans of data an impractical proposition for the purpose of knowledge discovery. Need to learn structures from data in streaming environment has been a driving force for making clustering a popular technique for knowledge discovery from data streams. Continuous nature of streaming data makes it infeasible to look for point membership among the clusters discovered so far, necessitating employment of a synopsis structure to consolidate incoming data points. This synopsis is exploited for building clustering scheme to meet subsequent user demands. The proposed Exclusive and Complete Clustering (ExCC) algorithm captures non-overlapping clusters in data streams with mixed attributes, such that each point either belongs to some cluster or is an outlier/noise. The algorithm is robust, adaptive to changes in data distribution and detects succinct outliers on-the-fly. It deploys a fixed granularity grid structure as synopsis and performs clustering by coalescing dense regions in grid. Speed-based pruning is applied to synopsis prior to clustering to ensure currency of discovered clusters. Extensive experimentation demonstrates that the algorithm is robust, identifies succinct outliers on-the-fly and is adaptive to change in the data distribution. ExCC algorithm is further evaluated for performance and compared with other contemporary algorithms.
引用
收藏
页码:127 / 152
页数:25
相关论文
共 50 条
  • [31] An unsupervised grid-based approach for clustering analysis
    YUE ShiHong1
    2Department of Electrical Engineering
    ScienceChina(InformationSciences), 2010, 53 (07) : 1345 - 1357
  • [32] An unsupervised grid-based approach for clustering analysis
    ShiHong Yue
    JeenShing Wang
    Gao Tao
    HuaXiang Wang
    Science China Information Sciences, 2010, 53 : 1345 - 1357
  • [33] A deflected grid-based algorithm for clustering analysis
    Department of Computer Science and Information Engineering, Tamkang University, 151 Ying-Chuan Road, Tamsui, Taipei County, Taiwan
    WSEAS Trans. Comput., 2008, 3 (125-132):
  • [34] An unsupervised grid-based approach for clustering analysis
    Yue ShiHong
    Wang JeenShing
    Tao Gao
    Wang HuaXiang
    SCIENCE CHINA-INFORMATION SCIENCES, 2010, 53 (07) : 1345 - 1357
  • [35] Non-parametric grid-based clustering algorithm for remote sensing data
    Pestunov, IA
    Sinyavsky, YN
    Proceedings of the Second IASTED International Multi-Conference on Automation, Control, and Information Technology - Signal and Image Processing, 2005, : 5 - 9
  • [36] A Research about grid-based spatial clustering method on regional data analysis
    Zhang, Yu-Wei
    Wan, Lu-He
    Journal of Harbin Institute of Technology (New Series), 2011, 18 (SUPPL. 1) : 171 - 175
  • [37] GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data
    Mansoori, Eghbal G.
    SOFT COMPUTING, 2014, 18 (05) : 905 - 922
  • [38] Data Streams Clustering Algorithm Based on Grid and Particle Swarm Optimization
    Ke, Luo
    Lin, Wang
    2009 INTERNATIONAL FORUM ON COMPUTER SCIENCE-TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2009, : 93 - 96
  • [39] Clustering over data streams based on grid density and index tree
    Ren J.
    Cai B.
    Hu C.
    Journal of Convergence Information Technology, 2011, 6 (01) : 83 - 93
  • [40] GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data
    Eghbal G. Mansoori
    Soft Computing, 2014, 18 : 905 - 922