Research on Parallel Data Stream Clustering Algorithm based on Grid and Density

被引:5
|
作者
Hu, Weihua [1 ]
Cheng, Mingzhong [1 ]
Wu, Guoping [1 ]
Wu, Liang [1 ]
机构
[1] Hang Zhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
关键词
Data stream mining; Clustering algorithm; Grid and density; Distributed; Map-Reduce; Parallel computing;
D O I
10.1109/CSMA.2015.21
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the emergence of big data and cloud computing, data stream arrives rapidly, large-scale and continuously, real-time data stream clustering analysis has become a hot topic in the study on the current data stream mining. Some existing data stream clustering algorithms cannot effectively deal with the high-dimensional data stream and are incompetent to find clusters of arbitrary shape in real-time, as well as the noise points could not be removed timely. To address these issues, this paper proposes PGDC-Stream, a algorithm based on grid and density for clustering data streams in a parallel distributed environment [4]. The algorithm adopts density threshold function to deal with the noise points and inspect and remove them periodically. It also can find clusters of arbitrary shape in large-scale data flow in real-time. The Map-Reduce framework is used for parallel cluster analysis of data streams.
引用
收藏
页码:70 / 75
页数:6
相关论文
共 50 条
  • [1] Clustering Algorithm Based on Grid and Density for Data Stream
    Wang, Lang
    Li, Haiqing
    [J]. MATERIALS SCIENCE, ENERGY TECHNOLOGY, AND POWER ENGINEERING I, 2017, 1839
  • [2] The research on data stream clustering algorithm based on active grid-density
    Department of Mathematics and Computer Science, Tongling University, Tongling, China
    [J]. Zhong, Z, 1600, Asian Research Publishing Network (ARPN) (44):
  • [3] A Clustering Algorithm Based on Density-Grid for Stream Data
    Zhang, Dandan
    Tian, Hui
    Sang, Yingpeng
    Li, Yidong
    Wu, Yanbo
    Wu, Jun
    Shen, Hong
    [J]. 2012 13TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS, AND TECHNOLOGIES (PDCAT 2012), 2012, : 398 - 403
  • [4] A Data Stream Clustering Algorithm Based on Density and Extended Grid
    Hua, Zheng
    Du, Tao
    Qu, Shouning
    Mou, Guodong
    [J]. INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2017, PT II, 2017, 10362 : 689 - 699
  • [5] A Density Granularity Grid Clustering Algorithm Based on Data Stream
    Wang, Li-fang
    Han, Xie
    [J]. EMERGING RESEARCH IN WEB INFORMATION SYSTEMS AND MINING, 2011, 238 : 113 - 120
  • [6] A Kind of Data Stream Clustering Algorithm Based on Grid-Density
    Zhong Zhishui
    [J]. ADVANCES IN COMPUTER SCIENCE, ENVIRONMENT, ECOINFORMATICS, AND EDUCATION, PT II, 2011, 215 : 418 - 423
  • [7] A Grid and Density-based Clustering Algorithm for Processing Data Stream
    Jia, Chen
    Tan, ChengYu
    Yong, Ai
    [J]. SECOND INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING: WGEC 2008, PROCEEDINGS, 2008, : 517 - +
  • [8] A density grid-based uncertain data stream clustering algorithm
    [J]. Zhao, J. (jintianzhao@yahoo.com), 1600, Binary Information Press (10):
  • [9] Stream Data Clustering Based on Grid Density and Attraction
    Tu, Li
    Chen, Yixin
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (03)
  • [10] An adaptive grid-density based data stream clustering algorithm based on uncertainty model
    Liu, Zhuo
    Yang, Yue
    Zhang, Jianpei
    Yang, Jing
    Chu, Yan
    Zhang, Zebao
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2014, 51 (11): : 2518 - 2527