Online Clustering of Evolving Data Streams Using a Density Grid-Based Method

被引:31
|
作者
Tareq, Mustafa [1 ]
Sundararajan, Elankovan A. [1 ]
Mohd, Masnizah [2 ]
Sani, Nor Samsiah [2 ]
机构
[1] Univ Kebangsaan Malaysia, Ctr Software Technol & Management, Fac Informat Sci & Technol, Bangi 43600, Malaysia
[2] Univ Kebangsaan Malaysia, Ctr Artificial Intelligence Technol, Fac Informat Sci & Technol, Bangi 43600, Malaysia
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Clustering algorithms; Real-time systems; Memory management; Software; Shape; Sensors; Social network services; Clustering; data stream; evolving; grid-based method; core-micro-cluster; online; BIG DATA; ITERATIVE FUSION; DATA ANALYTICS; INTERNET; ALGORITHM; THINGS; IOT;
D O I
10.1109/ACCESS.2020.3021684
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, a significant boost in data availability for persistent data streams has been observed. These data streams are continually evolving, with the clusters frequently forming arbitrary shapes instead of regular shapes in the data space. This characteristic leads to an exponential increase in the processing time of traditional clustering algorithms for data streams. In this study, we propose a new online method, which is a density grid-based method for data stream clustering. The primary objectives of the density grid-based method are to reduce the number of distant function calls and to improve the cluster quality. The method is conducted entirely online and consists of two main phases. The first phase generates the Core Micro-Clusters (CMCs), and the second phase combines the CMCs into macro clusters. The grid-based method was utilized as an outlier buffer in order to handle multi-density data and noises. The method was tested on real and synthetic data streams employing different quality metrics and was compared with the popular method of clustering evolving data streams into arbitrary shapes. The proposed method was demonstrated to be an effective solution for reducing the number of calls to the distance function and improving the cluster quality.
引用
收藏
页码:166472 / 166490
页数:19
相关论文
共 50 条
  • [1] A Density Grid-based Clustering Algorithm for Uncertain Data Streams
    Tu, Li
    Cui, Peng
    Tang, Keming
    [J]. 2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 347 - +
  • [2] Clustering data streams using grid-based synopsis
    Bhatnagar, Vasudha
    Kaur, Sharanjit
    Chakravarthy, Sharma
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 41 (01) : 127 - 152
  • [3] A Systematic Review of Density Grid-Based Clustering for Data Streams
    Tareq, Mustafa
    Sundararajan, Elankovan A.
    Harwood, Aaron
    Abu Bakar, Azuraliza
    [J]. IEEE ACCESS, 2022, 10 : 579 - 596
  • [4] Clustering data streams using grid-based synopsis
    Vasudha Bhatnagar
    Sharanjit Kaur
    Sharma Chakravarthy
    [J]. Knowledge and Information Systems, 2014, 41 : 127 - 152
  • [5] Statistical grid-based clustering over data streams
    Park, NH
    Lee, WS
    [J]. SIGMOD RECORD, 2004, 33 (01) : 32 - 37
  • [6] Grid-based clustering over an evolving data stream
    Wan, Renxia
    Chen, Jingchao
    Wang, Lixin
    Su, Xiaoke
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2009, 1 (04) : 393 - 410
  • [7] A grid-based clustering algorithm for high-dimensional data streams
    Lu, YS
    Sun, YF
    Xu, GP
    Liu, G
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 824 - 831
  • [8] Online embedding and clustering of evolving data streams
    Zubaroglu, Alaettin
    Atalay, Volkan
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (01) : 29 - 44
  • [9] A density grid-based uncertain data stream clustering algorithm
    [J]. Zhao, J. (jintianzhao@yahoo.com), 1600, Binary Information Press (10):
  • [10] An Efficient Grid-based Clustering Method by Finding Density Peaks
    Wu, Bo
    Wilamowski, B. M.
    [J]. PROCEEDINGS OF THE IECON 2016 - 42ND ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2016, : 837 - 842