Clustering data streams using grid-based synopsis

被引:0
|
作者
Vasudha Bhatnagar
Sharanjit Kaur
Sharma Chakravarthy
机构
[1] University of Delhi,Department of Computer Science
[2] University of Delhi,Department of Computer Science, Acharya Narendra Dev College
[3] The University of Texas,Computer Science and Engineering Department
来源
关键词
Stream clustering; Synopsis; Micro-cluster; Grid structure; Exclusive clustering; Complete clustering;
D O I
暂无
中图分类号
学科分类号
摘要
Continually advancing technology has made it feasible to capture data online for onward transmission as a steady flow of newly generated data points, termed as data stream. Continuity and unboundedness of data streams make storage of data and multiple scans of data an impractical proposition for the purpose of knowledge discovery. Need to learn structures from data in streaming environment has been a driving force for making clustering a popular technique for knowledge discovery from data streams. Continuous nature of streaming data makes it infeasible to look for point membership among the clusters discovered so far, necessitating employment of a synopsis structure to consolidate incoming data points. This synopsis is exploited for building clustering scheme to meet subsequent user demands. The proposed Exclusive and Complete Clustering (ExCC) algorithm captures non-overlapping clusters in data streams with mixed attributes, such that each point either belongs to some cluster or is an outlier/noise. The algorithm is robust, adaptive to changes in data distribution and detects succinct outliers on-the-fly. It deploys a fixed granularity grid structure as synopsis and performs clustering by coalescing dense regions in grid. Speed-based pruning is applied to synopsis prior to clustering to ensure currency of discovered clusters. Extensive experimentation demonstrates that the algorithm is robust, identifies succinct outliers on-the-fly and is adaptive to change in the data distribution. ExCC algorithm is further evaluated for performance and compared with other contemporary algorithms.
引用
收藏
页码:127 / 152
页数:25
相关论文
共 50 条
  • [41] A real-time grid-based clustering algorithm for large data set
    Yu, Zhiwen
    Wong, Hau-San
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 740 - +
  • [42] Multidimensional grid-based clustering with local differential privacy
    Fu, Nan
    Ni, Weiwei
    Hu, Haibo
    Zhang, Sen
    INFORMATION SCIENCES, 2023, 623 : 402 - 420
  • [43] Density Grid-Based Clustering for Wireless Sensors Networks
    Abdullah, Manal
    Eldin, Hend Nour
    Al-Moshadak, Tahani
    Alshaik, Rawan
    Al-Anesi, Inas
    INTERNATIONAL CONFERENCE ON COMMUNICATIONS, MANAGEMENT, AND INFORMATION TECHNOLOGY (ICCMIT'2015), 2015, 65 : 35 - 47
  • [44] Extended grid-based clustering algorithm with referential parameters
    Zhou, Yan-Tao
    Wu, Zheng-Guo
    Yi, Xing-Dong
    Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2009, 36 (02): : 48 - 52
  • [45] A Grid-based Approach to Continuous Clustering of Moving Objects
    Zhu, Tongyu
    Zhang, Yuan
    Lv, Weifeng
    Wang, Fei
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON ADVANCED ENGINEERING COMPUTING AND APPLICATIONS IN SCIENCES (ADVCOMP 2010), 2010, : 93 - 98
  • [46] Grid-based parallel data mining
    Li, Jic
    Jiang, Xiufeng
    DCABES 2006 Proceedings, Vols 1 and 2, 2006, : 230 - 232
  • [47] Energy-optimal grid-based clustering in wireless microsensor networks with data aggregation
    Zhuang, Yanyan
    Pan, Jianping
    Wu, Guoxing
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2010, 25 (06) : 531 - 550
  • [48] Accurate Grid-based Clustering Algorithm with Diagonal Grid Searching and Merging
    Liu, Feng
    Ye, Chengcheng
    Zhu, Erzhou
    2017 3RD INTERNATIONAL CONFERENCE ON APPLIED MATERIALS AND MANUFACTURING TECHNOLOGY (ICAMMT 2017), 2017, 242
  • [49] A grid-based clustering algorithm for wild bird distribution
    Wang, Yuwei
    Zhou, Yuanchun
    Liu, Ying
    Luo, Ze
    Guo, Danhuai
    Shao, Jing
    Tan, Fei
    Wu, Liang
    Li, Jianhui
    Yan, Baoping
    FRONTIERS OF COMPUTER SCIENCE, 2013, 7 (04) : 475 - 485
  • [50] Grid-based analysis of seismic data
    Wesseloo, J.
    Woodward, K.
    Pereira, J.
    JOURNAL OF THE SOUTHERN AFRICAN INSTITUTE OF MINING AND METALLURGY, 2014, 114 (10) : 815 - 822