Data Stream Clustering: A Survey

被引:330
|
作者
Silva, Jonathan A. [1 ]
Faria, Elaine R. [1 ,2 ]
Barros, Rodrigo C. [1 ]
Hruschka, Eduardo R. [1 ]
de Carvalho, Andre C. P. L. F. [1 ]
Gama, Joao [3 ,4 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci ICMC, Sao Paulo, Brazil
[2] Univ Fed Uberlandia, Sch Comp, BR-38400 Uberlandia, MG, Brazil
[3] Univ Porto, Lab Artificial Intelligence & Decis Support LIAAD, P-4100 Oporto, Portugal
[4] Univ Porto, FEP, P-4100 Oporto, Portugal
基金
巴西圣保罗研究基金会;
关键词
Algorithms; Data stream clustering; online clustering; ALGORITHM; FRAMEWORK; TREES; MODEL;
D O I
10.1145/2522968.2522981
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data stream mining is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. In this context, several data stream clustering algorithms have been proposed to perform unsupervised learning. Nevertheless, data stream clustering imposes several challenges to be addressed, such as dealing with nonstationary, unbounded data that arrive in an online fashion. The intrinsic nature of stream data requires the development of algorithms capable of performing fast and incremental processing of data objects, suitably addressing time and memory limitations. In this article, we present a survey of data stream clustering algorithms, providing a thorough discussion of the main design components of state-of-the-art algorithms. In addition, this work addresses the temporal aspects involved in data stream clustering, and presents an overview of the usually employed experimental methodologies. A number of references are provided that describe applications of data stream clustering in different domains, such as network intrusion detection, sensor networks, and stock market analysis. Information regarding software packages and data repositories are also available for helping researchers and practitioners. Finally, some important issues and open questions that can be subject of future research are discussed.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] A survey on data stream clustering and classification
    Hai-Long Nguyen
    Woon, Yew-Kwong
    Ng, Wee-Keong
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 45 (03) : 535 - 569
  • [2] Clustering data stream: A survey of algorithms
    Mahdiraji, Alireza
    [J]. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2009, 13 (02) : 39 - 44
  • [3] A survey on data stream clustering and classification
    Hai-Long Nguyen
    Yew-Kwong Woon
    Wee-Keong Ng
    [J]. Knowledge and Information Systems, 2015, 45 : 535 - 569
  • [4] Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms
    Carnein, Matthias
    Trautmann, Heike
    [J]. BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2019, 61 (03) : 277 - 297
  • [5] Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms
    Matthias Carnein
    Heike Trautmann
    [J]. Business & Information Systems Engineering, 2019, 61 : 277 - 297
  • [6] Data stream clustering: a review
    Zubaroglu, Alaettin
    Atalay, Volkan
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (02) : 1201 - 1236
  • [7] Data stream clustering: a review
    Alaettin Zubaroğlu
    Volkan Atalay
    [J]. Artificial Intelligence Review, 2021, 54 : 1201 - 1236
  • [8] MVStream: Multiview Data Stream Clustering
    Huang, Ling
    Wang, Chang-Dong
    Chao, Hong-Yang
    Yu, Philip S.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (09) : 3482 - 3496
  • [9] Data Stream Clustering with Affinity Propagation
    Zhang, Xiangliang
    Furtlehner, Cyril
    Germain-Renaud, Cecile
    Sebag, Michele
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (07) : 1644 - 1656
  • [10] An evaluation of data stream clustering algorithms
    Mansalis, Stratos
    Ntoutsi, Eirini
    Pelekis, Nikos
    Theodoridis, Yannis
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2018, 11 (04) : 167 - 187