Clustering data streams using grid-based synopsis

被引:0
|
作者
Vasudha Bhatnagar
Sharanjit Kaur
Sharma Chakravarthy
机构
[1] University of Delhi,Department of Computer Science
[2] University of Delhi,Department of Computer Science, Acharya Narendra Dev College
[3] The University of Texas,Computer Science and Engineering Department
来源
关键词
Stream clustering; Synopsis; Micro-cluster; Grid structure; Exclusive clustering; Complete clustering;
D O I
暂无
中图分类号
学科分类号
摘要
Continually advancing technology has made it feasible to capture data online for onward transmission as a steady flow of newly generated data points, termed as data stream. Continuity and unboundedness of data streams make storage of data and multiple scans of data an impractical proposition for the purpose of knowledge discovery. Need to learn structures from data in streaming environment has been a driving force for making clustering a popular technique for knowledge discovery from data streams. Continuous nature of streaming data makes it infeasible to look for point membership among the clusters discovered so far, necessitating employment of a synopsis structure to consolidate incoming data points. This synopsis is exploited for building clustering scheme to meet subsequent user demands. The proposed Exclusive and Complete Clustering (ExCC) algorithm captures non-overlapping clusters in data streams with mixed attributes, such that each point either belongs to some cluster or is an outlier/noise. The algorithm is robust, adaptive to changes in data distribution and detects succinct outliers on-the-fly. It deploys a fixed granularity grid structure as synopsis and performs clustering by coalescing dense regions in grid. Speed-based pruning is applied to synopsis prior to clustering to ensure currency of discovered clusters. Extensive experimentation demonstrates that the algorithm is robust, identifies succinct outliers on-the-fly and is adaptive to change in the data distribution. ExCC algorithm is further evaluated for performance and compared with other contemporary algorithms.
引用
收藏
页码:127 / 152
页数:25
相关论文
共 50 条
  • [1] Clustering data streams using grid-based synopsis
    Bhatnagar, Vasudha
    Kaur, Sharanjit
    Chakravarthy, Sharma
    KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 41 (01) : 127 - 152
  • [2] Statistical grid-based clustering over data streams
    Park, NH
    Lee, WS
    SIGMOD RECORD, 2004, 33 (01) : 32 - 37
  • [3] Online Clustering of Evolving Data Streams Using a Density Grid-Based Method
    Tareq, Mustafa
    Sundararajan, Elankovan A.
    Mohd, Masnizah
    Sani, Nor Samsiah
    IEEE ACCESS, 2020, 8 : 166472 - 166490
  • [4] A Density Grid-based Clustering Algorithm for Uncertain Data Streams
    Tu, Li
    Cui, Peng
    Tang, Keming
    2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 347 - +
  • [5] A Systematic Review of Density Grid-Based Clustering for Data Streams
    Tareq, Mustafa
    Sundararajan, Elankovan A.
    Harwood, Aaron
    Abu Bakar, Azuraliza
    IEEE ACCESS, 2022, 10 : 579 - 596
  • [6] A grid-based clustering algorithm for high-dimensional data streams
    Lu, YS
    Sun, YF
    Xu, GP
    Liu, G
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 824 - 831
  • [7] A grid-based subspace clustering algorithm for high-dimensional data streams
    Sun, Yufen
    Lu, Yansheng
    WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS, 2006, 4256 : 37 - 48
  • [8] Wavelet synopsis based clustering of parallel data streams
    Chen H.-H.
    Shi B.-L.
    Qian J.-B.
    Chen Y.-F.
    Ruan Jian Xue Bao/Journal of Software, 2010, 21 (04): : 644 - 658
  • [9] Grid-Based Clustering Using Boundary Detection
    Du, Mingjing
    Wu, Fuyu
    ENTROPY, 2022, 24 (11)
  • [10] Grid-based clustering over an evolving data stream
    Wan, Renxia
    Chen, Jingchao
    Wang, Lixin
    Su, Xiaoke
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2009, 1 (04) : 393 - 410