Clustering data streams using grid-based synopsis

被引:0
|
作者
Vasudha Bhatnagar
Sharanjit Kaur
Sharma Chakravarthy
机构
[1] University of Delhi,Department of Computer Science
[2] University of Delhi,Department of Computer Science, Acharya Narendra Dev College
[3] The University of Texas,Computer Science and Engineering Department
来源
关键词
Stream clustering; Synopsis; Micro-cluster; Grid structure; Exclusive clustering; Complete clustering;
D O I
暂无
中图分类号
学科分类号
摘要
Continually advancing technology has made it feasible to capture data online for onward transmission as a steady flow of newly generated data points, termed as data stream. Continuity and unboundedness of data streams make storage of data and multiple scans of data an impractical proposition for the purpose of knowledge discovery. Need to learn structures from data in streaming environment has been a driving force for making clustering a popular technique for knowledge discovery from data streams. Continuous nature of streaming data makes it infeasible to look for point membership among the clusters discovered so far, necessitating employment of a synopsis structure to consolidate incoming data points. This synopsis is exploited for building clustering scheme to meet subsequent user demands. The proposed Exclusive and Complete Clustering (ExCC) algorithm captures non-overlapping clusters in data streams with mixed attributes, such that each point either belongs to some cluster or is an outlier/noise. The algorithm is robust, adaptive to changes in data distribution and detects succinct outliers on-the-fly. It deploys a fixed granularity grid structure as synopsis and performs clustering by coalescing dense regions in grid. Speed-based pruning is applied to synopsis prior to clustering to ensure currency of discovered clusters. Extensive experimentation demonstrates that the algorithm is robust, identifies succinct outliers on-the-fly and is adaptive to change in the data distribution. ExCC algorithm is further evaluated for performance and compared with other contemporary algorithms.
引用
收藏
页码:127 / 152
页数:25
相关论文
共 50 条
  • [21] AN EFFECTIVE AND EFFICIENT GRID-BASED DATA CLUSTERING ALGORITHM USING INTUITIVE NEIGHBOR RELATIONSHIP FOR DATA MINING
    Tsai, Cheng-Fa
    Huang, Sheng-Chiang
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOL. 2, 2015, : 478 - 483
  • [22] A NOVEL GRID-BASED CLUSTERING ALGORITHM
    Starczewski, Artur
    Scherer, Magdalena M.
    Ksiazek, Wojciech
    Debski, Maciej
    Wang, Lipo
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2021, 11 (04) : 319 - 330
  • [23] Grid-based dynamic clustering with grid proximity measure
    Lee, Gun Ho
    INTELLIGENT DATA ANALYSIS, 2016, 20 (04) : 853 - 875
  • [24] Parallel grid-based density peak clustering of big trajectory data
    Xinzheng Niu
    Yunhong Zheng
    Philippe Fournier-Viger
    Bing Wang
    Applied Intelligence, 2022, 52 : 17042 - 17057
  • [25] Approximate trace of grid-based clusters over high dimensional data streams
    Park, Nam Hun
    Lee, Won Suk
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 753 - +
  • [26] Parallel grid-based density peak clustering of big trajectory data
    Niu, Xinzheng
    Zheng, Yunhong
    Fournier-Viger, Philippe
    Wang, Bing
    APPLIED INTELLIGENCE, 2022, 52 (15) : 17042 - 17057
  • [27] The Grid-Based Data Integration
    Zhang Wen-dong
    Ma zhen
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS, 2009, : 601 - 603
  • [28] Grid-based improving clustering quality algorithm
    School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
    不详
    Jisuanji Gongcheng, 2006, 3 (12-13+98):
  • [29] A Grid-Based Density Peaks Clustering Algorithm
    Fang, Xintong
    Xu, Zhen
    Ji, Haifeng
    Wang, Baoliang
    Huang, Zhiyao
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (04) : 5476 - 5484
  • [30] A fast consistent grid-based clustering algorithm
    Tarasenko, Anton S.
    Berikov, Vladimir B.
    Pestunov, Igor A.
    Rylov, Sergey A.
    Ruzankin, Pavel S.
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (04)