Online embedding and clustering of evolving data streams

被引:4
|
作者
Zubaroglu, Alaettin [1 ]
Atalay, Volkan [1 ]
机构
[1] Middle East Tech Univ, Dept Comp Engn, Dumlupinar Bulvari 1, TR-06800 Ankara, Turkey
关键词
data streams; drift adaptation; drift detection; evolving data streams; stream clustering;
D O I
10.1002/sam.11590
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Number of connected devices is steadily increasing and this trend is expected to continue in the near future. Connected devices continuously generate data streams and the data streams may often be high dimensional and contain concept drift. Clustering is one of the most suitable methods for real-time data stream processing, since clustering can be applied with less prior information about the data. Also, data embedding makes the visualization of high dimensional data possible and may simplify clustering process. There exist several data stream clustering algorithms in the literature; however, no data stream embedding method exists. Uniform Manifold Approximation and Projection (UMAP) is a data embedding algorithm that is suitable to be applied on stationary (stable) data streams, though it cannot adapt concept drift. In this study, we describe a novel method EmCStream, to apply UMAP on evolving (nonstationary) data streams, to detect and adapt concept drift and to cluster embedded data instances using a distance or partitioning-based clustering algorithm. We have evaluated EmCStream against the state-of-the-art stream clustering algorithms using both synthetic and real data streams containing concept drift. EmCStream outperforms DenStream and CluStream, in terms of clustering quality, on both synthetic and real evolving data streams.
引用
收藏
页码:29 / 44
页数:16
相关论文
共 50 条
  • [1] Online Clustering for Evolving Data Streams with Online Anomaly Detection
    Chenaghlou, Milad
    Moshtaghi, Masud
    Leckie, Christopher
    Salehi, Mahsa
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 506 - 519
  • [2] Online Sparse Representation Clustering for Evolving Data Streams
    Chen, Jie
    Yang, Shengxiang
    Fahy, Conor
    Wang, Zhu
    Guo, Yinan
    Chen, Yingke
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 15
  • [3] CPOCEDS-concept preserving online clustering for evolving data streams
    Jafseer, K. T.
    Shailesh, S.
    Sreekumar, A.
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (03): : 2983 - 2998
  • [4] Fully online clustering of evolving data streams into arbitrarily shaped clusters
    Hyde, Richard
    Angelov, Plamen
    MacKenzie, A. R.
    [J]. INFORMATION SCIENCES, 2017, 382 : 96 - 114
  • [5] Dynamically Evolving Clustering for Data Streams
    Baruah, Rashmi Dutta
    Angelov, Plamen
    Baruah, Diganta
    [J]. 2014 IEEE CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS (EAIS), 2014,
  • [6] Clustering over Evolving Data Streams Based on Online Recent-Biased Approximation
    Fan, Wei
    Koyanagi, Yusuke
    Asakura, Koichi
    Watanabe, Toyohide
    [J]. KNOWLEDGE ACQUISITION: APPROACHES, ALGORITHMS AND APPLICATIONS, 2009, 5465 : 12 - +
  • [7] Online Clustering of Evolving Data Streams Using a Density Grid-Based Method
    Tareq, Mustafa
    Sundararajan, Elankovan A.
    Mohd, Masnizah
    Sani, Nor Samsiah
    [J]. IEEE ACCESS, 2020, 8 : 166472 - 166490
  • [8] SPARSE SUBSPACE CLUSTERING FOR EVOLVING DATA STREAMS
    Sui, Jinping
    Liu, Zhen
    Liu, Li
    Jung, Alexander
    Liu, Tianpeng
    Peng, Bo
    Li, Xiang
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7455 - 7459
  • [9] Online clustering of parallel data streams
    Beringer, Juergen
    Huellermeier, Eyke
    [J]. DATA & KNOWLEDGE ENGINEERING, 2006, 58 (02) : 180 - 204
  • [10] Robust Clustering for Tracking Noisy Evolving Data Streams
    Nasraoui, Olfa
    Rojas, Carlos
    [J]. PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 619 - 623