Stream Data Clustering Based on Grid Density and Attraction

被引:97
|
作者
Tu, Li [1 ]
Chen, Yixin [2 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Inst Informat Sci & Technol, Nanjing 210016, Peoples R China
[2] Washington Univ, St Louis, MO 63130 USA
关键词
Stream data; data mining; clustering; density-based algorithms;
D O I
10.1145/1552303.1552305
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering real-time stream data is an important and challenging problem. Existing algorithms such as CluStream are based on the k-means algorithm. These clustering algorithms have difficulties finding clusters of arbitrary shapes and handling outliers. Further, they require the knowledge of k and user-specified time window. To address these issues, this article proposes D-Stream, a framework for clustering stream data using a density-based approach. Our algorithm uses an online component that maps each input data record into a grid and an offline component that computes the grid density and clusters the grids based on the density. The algorithm adopts a density decaying technique to capture the dynamic changes of a data stream and a attraction-based mechanism to accurately generate cluster boundaries. Exploiting the intricate relationships among the decay factor, attraction, data density, and cluster structure, our algorithm can efficiently and effectively generate and adjust the clusters in real time. Further, a theoretically sound technique is developed to detect and remove sporadic grids mapped by outliers in order to dramatically improve the space and time efficiency of the system. The technique makes high-speed data stream clustering feasible without degrading the clustering quality. The experimental results show that our algorithm has superior quality and efficiency, can find clusters of arbitrary shapes, and can accurately recognize the evolving behaviors of real-time data streams.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Clustering Algorithm Based on Grid and Density for Data Stream
    Wang, Lang
    Li, Haiqing
    [J]. MATERIALS SCIENCE, ENERGY TECHNOLOGY, AND POWER ENGINEERING I, 2017, 1839
  • [2] A Density Granularity Grid Clustering Algorithm Based on Data Stream
    Wang, Li-fang
    Han, Xie
    [J]. EMERGING RESEARCH IN WEB INFORMATION SYSTEMS AND MINING, 2011, 238 : 113 - 120
  • [3] A Data Stream Clustering Algorithm Based on Density and Extended Grid
    Hua, Zheng
    Du, Tao
    Qu, Shouning
    Mou, Guodong
    [J]. INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2017, PT II, 2017, 10362 : 689 - 699
  • [4] A Clustering Algorithm Based on Density-Grid for Stream Data
    Zhang, Dandan
    Tian, Hui
    Sang, Yingpeng
    Li, Yidong
    Wu, Yanbo
    Wu, Jun
    Shen, Hong
    [J]. 2012 13TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS, AND TECHNOLOGIES (PDCAT 2012), 2012, : 398 - 403
  • [5] A Grid and Density-based Clustering Algorithm for Processing Data Stream
    Jia, Chen
    Tan, ChengYu
    Yong, Ai
    [J]. SECOND INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING: WGEC 2008, PROCEEDINGS, 2008, : 517 - +
  • [6] A Kind of Data Stream Clustering Algorithm Based on Grid-Density
    Zhong Zhishui
    [J]. ADVANCES IN COMPUTER SCIENCE, ENVIRONMENT, ECOINFORMATICS, AND EDUCATION, PT II, 2011, 215 : 418 - 423
  • [7] Research on Parallel Data Stream Clustering Algorithm based on Grid and Density
    Hu, Weihua
    Cheng, Mingzhong
    Wu, Guoping
    Wu, Liang
    [J]. 2015 International Conference on Computer Science and Mechanical Automation (CSMA), 2015, : 70 - 75
  • [8] Data Stream Clustering Based on Grid Coupling
    Zhang, Dong-Yue
    Zhou, Li-Hua
    Wu, Xiang-Yun
    Zhao, Li-Hong
    [J]. Ruan Jian Xue Bao/Journal of Software, 2019, 30 (03): : 667 - 683
  • [9] A Density-Grid Based Clustering Algorithm on Data Stream Using Resilient Distributed Datasets
    Zhang, Yuan
    Zhang, Jiongmin
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2016, 2016, 9673 : 316 - 322
  • [10] An Incremental Algorithm Based on Irregular Grid for Clustering Data Stream
    Yin, Guisheng
    Yu, Xiang
    Yang, Guang
    [J]. 2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 5680 - 5684