ESA-Stream: Efficient Self-Adaptive Online Data Stream Clustering

被引:18
|
作者
Li, Yanni [1 ]
Li, Hui [2 ]
Wang, Zhi [1 ]
Liu, Bing [3 ,4 ]
Cui, Jiangtao [1 ]
Fei, Hang [1 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China
[3] Peking Univ, Wangxuan Inst Comp Technol, Beijing 100871, Peoples R China
[4] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
基金
中国国家自然科学基金;
关键词
Clustering algorithms; Heuristic algorithms; Real-time systems; Partitioning algorithms; Dimensionality reduction; Clustering methods; Indexes; Self-adaptive; data stream; online clustering;
D O I
10.1109/TKDE.2020.2990196
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many big data applications produce a massive amount of high-dimensional, real-time, and evolving streaming data. Clustering such data streams with both effectiveness and efficiency are critical for these applications. Although there are well-known data stream clustering algorithms that are based on the popular online-offline framework, these algorithms still face some major challenges. Several critical questions are still not answer satisfactorily: How to perform dimensionality reduction effectively and efficiently in the online dynamic environment? How to enable the clustering algorithm to achieve complete real-time online processing? How to make algorithm parameters learn in a self-supervised or self-adaptive manner to cope with high-speed evolving streams? In this paper, we focus on tackling these challenges by proposing a fully online data stream clustering algorithm (called ESA-Stream) that can learn parameters online dynamically in a self-adaptive manner, speedup dimensionality reduction, and cluster data streams effectively and efficiently in an online and dynamic environment. Experiments on a wide range of synthetic and real-world data streams show that ESA-Stream outperforms state-of-the-art baselines considerably in both effectiveness and efficiency.
引用
收藏
页码:617 / 630
页数:14
相关论文
共 50 条
  • [41] Data Stream Classification by Adaptive Semi-supervised Fuzzy Clustering
    Castellano, Giovanna
    Fanelli, Anna Maria
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II, 2017, 10614 : 770 - 771
  • [42] Middleware for enterprise scale data stream management using utility-driven self-adaptive information flows
    Kumar, Vibhore
    Cooper, Brian F.
    Cai, Zhongtang
    Eisenhauer, Greg
    Schwan, Karsten
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2007, 10 (04): : 443 - 455
  • [43] Middleware for enterprise scale data stream management using utility-driven self-adaptive information flows
    Vibhore Kumar
    Brian F. Cooper
    Zhongtang Cai
    Greg Eisenhauer
    Karsten Schwan
    [J]. Cluster Computing, 2007, 10 (4) : 443 - 455
  • [44] SubtStream: Online subtractive stream clustering algorithm
    Milli, Musa
    Bulut, Hasan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (15):
  • [45] Middleware for enterprise scale data stream management using utility-driven self-adaptive information flows
    Kumar, Vibhore
    Cooper, Brian F.
    Cai, Zhongtang
    Eisenhauer, Greg
    Schwan, Karsten
    [J]. Cluster Computing, 2007, 10 (04) : 443 - 455
  • [46] Discovering Communities with Self-adaptive k Clustering in Microblog Data
    Huang, Ting
    Peng, Dunlu
    Cao, Lidong
    [J]. SECOND INTERNATIONAL CONFERENCE ON CLOUD AND GREEN COMPUTING / SECOND INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING AND ITS APPLICATIONS (CGC/SCA 2012), 2012, : 383 - 390
  • [47] ARD-Stream: An adaptive radius density-based stream clustering
    Faroughi, Azadeh
    Boostani, Reza
    Tajalizadeh, Hadi
    Javidan, Reza
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 149 : 416 - 431
  • [48] Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms
    Matthias Carnein
    Heike Trautmann
    [J]. Business & Information Systems Engineering, 2019, 61 : 277 - 297
  • [49] A survey on data stream clustering and classification
    Hai-Long Nguyen
    Woon, Yew-Kwong
    Ng, Wee-Keong
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 45 (03) : 535 - 569
  • [50] An evaluation of data stream clustering algorithms
    Mansalis, Stratos
    Ntoutsi, Eirini
    Pelekis, Nikos
    Theodoridis, Yannis
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2018, 11 (04) : 167 - 187