ESA-Stream: Efficient Self-Adaptive Online Data Stream Clustering

被引:18
|
作者
Li, Yanni [1 ]
Li, Hui [2 ]
Wang, Zhi [1 ]
Liu, Bing [3 ,4 ]
Cui, Jiangtao [1 ]
Fei, Hang [1 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China
[3] Peking Univ, Wangxuan Inst Comp Technol, Beijing 100871, Peoples R China
[4] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
基金
中国国家自然科学基金;
关键词
Clustering algorithms; Heuristic algorithms; Real-time systems; Partitioning algorithms; Dimensionality reduction; Clustering methods; Indexes; Self-adaptive; data stream; online clustering;
D O I
10.1109/TKDE.2020.2990196
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many big data applications produce a massive amount of high-dimensional, real-time, and evolving streaming data. Clustering such data streams with both effectiveness and efficiency are critical for these applications. Although there are well-known data stream clustering algorithms that are based on the popular online-offline framework, these algorithms still face some major challenges. Several critical questions are still not answer satisfactorily: How to perform dimensionality reduction effectively and efficiently in the online dynamic environment? How to enable the clustering algorithm to achieve complete real-time online processing? How to make algorithm parameters learn in a self-supervised or self-adaptive manner to cope with high-speed evolving streams? In this paper, we focus on tackling these challenges by proposing a fully online data stream clustering algorithm (called ESA-Stream) that can learn parameters online dynamically in a self-adaptive manner, speedup dimensionality reduction, and cluster data streams effectively and efficiently in an online and dynamic environment. Experiments on a wide range of synthetic and real-world data streams show that ESA-Stream outperforms state-of-the-art baselines considerably in both effectiveness and efficiency.
引用
收藏
页码:617 / 630
页数:14
相关论文
共 50 条
  • [1] ESA-Stream: Efficient Self-Adaptive Online Data Stream Clustering (Extended Abstract)
    Li, Yanni
    Li, Hui
    Wang, Zhi
    Liu, Bing
    Cui, Jiangtao
    Fei, Hang
    [J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2329 - +
  • [2] Self-Adaptive Anytime Stream Clustering
    Kranen, Philipp
    Assent, Ira
    Baldauf, Corinna
    Seidl, Thomas
    [J]. 2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 249 - +
  • [3] Self-Adaptive Framework for Efficient Stream Data Classification on Storm
    Deng, Shizhuo
    Wang, Botao
    Huang, Shan
    Yue, Chuncheng
    Zhou, Jianpeng
    Wang, Guoren
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (01): : 123 - 136
  • [4] OPOSSAM: Online Prediction of Stream Data Using Self-adaptive Memory
    Yamaguchi, Akihiro
    Maya, Shigeru
    Inagi, Tatsuya
    Ueno, Ken
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2355 - 2364
  • [5] Research on Self-Adaptive Stream Data Mining
    Xiao, Fang
    [J]. 2016 INTERNATIONAL CONGRESS ON COMPUTATION ALGORITHMS IN ENGINEERING (ICCAE 2016), 2016, : 1 - 7
  • [6] Self-adaptive Clustering Data Stream Algorithm Based on SSMC-Tree
    Yang, Kehua
    HeqingGao
    Chen, Lin
    Yuan, Qiong
    [J]. PROCEEDINGS OF 2013 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2012, : 342 - 345
  • [7] A Resource-Efficient Monitoring Architecture for Hardware Accelerated Self-Adaptive Online Data Stream Compression
    Najmabadi, Seyyed Mandi
    Pandit, Prajwala
    Trung-Hieu Tran
    Simon, Sven
    [J]. 2017 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA 2017), 2017, : 222 - 227
  • [8] A Self-Adaptive Dynamic Partial Reconfigurable Architecture for Online Data Stream Compression
    Najmabadi, Seyyed Mahdi
    Wang, Zhe
    Baroud, Yousef
    Simon, Sven
    [J]. 2016 INTERNATIONAL CONFERENCE ON FPGA RECONFIGURATION FOR GENERAL-PURPOSE COMPUTING (FPGA4GPC), 2016, : 19 - 24
  • [9] A density-based competitive data stream clustering network with self-adaptive distance metric
    Xu, Baile
    Shen, Furao
    Zhao, Jinxi
    [J]. NEURAL NETWORKS, 2019, 110 : 141 - 158
  • [10] A Distributed Framework for Online Stream Data Clustering
    Ding, Jiafeng
    Fang, Junhua
    Chao, Pingfu
    Xu, Jiajie
    Zhao, PengPeng
    Zhao, Lei
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT I, 2020, 12452 : 190 - 204