Research on Parallel Data Stream Clustering Algorithm based on Grid and Density

被引:5
|
作者
Hu, Weihua [1 ]
Cheng, Mingzhong [1 ]
Wu, Guoping [1 ]
Wu, Liang [1 ]
机构
[1] Hang Zhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
关键词
Data stream mining; Clustering algorithm; Grid and density; Distributed; Map-Reduce; Parallel computing;
D O I
10.1109/CSMA.2015.21
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the emergence of big data and cloud computing, data stream arrives rapidly, large-scale and continuously, real-time data stream clustering analysis has become a hot topic in the study on the current data stream mining. Some existing data stream clustering algorithms cannot effectively deal with the high-dimensional data stream and are incompetent to find clusters of arbitrary shape in real-time, as well as the noise points could not be removed timely. To address these issues, this paper proposes PGDC-Stream, a algorithm based on grid and density for clustering data streams in a parallel distributed environment [4]. The algorithm adopts density threshold function to deal with the noise points and inspect and remove them periodically. It also can find clusters of arbitrary shape in large-scale data flow in real-time. The Map-Reduce framework is used for parallel cluster analysis of data streams.
引用
收藏
页码:70 / 75
页数:6
相关论文
共 50 条
  • [31] Research on application of grid-based and density-based clustering algorithm
    Shen, LX
    Yan, C
    [J]. PROCEEDINGS OF 2003 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS I AND II, 2003, : 684 - 689
  • [32] A Density Grid-based Clustering Algorithm for Uncertain Data Streams
    Tu, Li
    Cui, Peng
    Tang, Keming
    [J]. 2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 347 - +
  • [33] Parallel grid-based density peak clustering of big trajectory data
    Xinzheng Niu
    Yunhong Zheng
    Philippe Fournier-Viger
    Bing Wang
    [J]. Applied Intelligence, 2022, 52 : 17042 - 17057
  • [34] Research on Data Stream Clustering Algorithm Based on Decay Time Window
    Wang, Xingang
    Wang, Linlin
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,
  • [35] Parallel grid-based density peak clustering of big trajectory data
    Niu, Xinzheng
    Zheng, Yunhong
    Fournier-Viger, Philippe
    Wang, Bing
    [J]. APPLIED INTELLIGENCE, 2022, 52 (15) : 17042 - 17057
  • [36] A study of the grid and density based algorithm clustering
    Lin, SD
    [J]. Proceedings of the 2005 International Conference on Management Science and Engineering, 2005, : 1160 - 1163
  • [37] Evolving data stream clustering algorithm based on the shared nearest neighbor density
    [J]. Gao, Bing, 1703, University of Science and Technology Beijing (36):
  • [38] An on-line density-based clustering algorithm for spatial data stream
    [J]. Yu, Y.-W. (yuyanwei0530@gmail.com), 1600, Science Press (38):
  • [39] Research on Parallelized Stream Data Micro Clustering Algorithm
    Ma, Ke
    Li, Lingjuan
    Ji, Yimu
    Luo, Shengmei
    Wen, Tao
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCES IN MECHANICAL ENGINEERING AND INDUSTRIAL INFORMATICS, 2015, 15 : 629 - 634
  • [40] Improving K-Means Algorithm by Grid-Density Clustering for Distributed WSN Data Stream
    Alghamdi, Yassmeen
    Abdullah, Manal
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (11) : 583 - 588