Data Stream Clustering: A Survey

被引:330
|
作者
Silva, Jonathan A. [1 ]
Faria, Elaine R. [1 ,2 ]
Barros, Rodrigo C. [1 ]
Hruschka, Eduardo R. [1 ]
de Carvalho, Andre C. P. L. F. [1 ]
Gama, Joao [3 ,4 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci ICMC, Sao Paulo, Brazil
[2] Univ Fed Uberlandia, Sch Comp, BR-38400 Uberlandia, MG, Brazil
[3] Univ Porto, Lab Artificial Intelligence & Decis Support LIAAD, P-4100 Oporto, Portugal
[4] Univ Porto, FEP, P-4100 Oporto, Portugal
基金
巴西圣保罗研究基金会;
关键词
Algorithms; Data stream clustering; online clustering; ALGORITHM; FRAMEWORK; TREES; MODEL;
D O I
10.1145/2522968.2522981
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data stream mining is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. In this context, several data stream clustering algorithms have been proposed to perform unsupervised learning. Nevertheless, data stream clustering imposes several challenges to be addressed, such as dealing with nonstationary, unbounded data that arrive in an online fashion. The intrinsic nature of stream data requires the development of algorithms capable of performing fast and incremental processing of data objects, suitably addressing time and memory limitations. In this article, we present a survey of data stream clustering algorithms, providing a thorough discussion of the main design components of state-of-the-art algorithms. In addition, this work addresses the temporal aspects involved in data stream clustering, and presents an overview of the usually employed experimental methodologies. A number of references are provided that describe applications of data stream clustering in different domains, such as network intrusion detection, sensor networks, and stock market analysis. Information regarding software packages and data repositories are also available for helping researchers and practitioners. Finally, some important issues and open questions that can be subject of future research are discussed.
引用
收藏
页数:31
相关论文
共 50 条
  • [21] Clustering Categorical Data: A Survey
    Naouali, Sami
    Ben Salem, Semeh
    Chtourou, Zied
    [J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2020, 19 (01) : 49 - 96
  • [22] A Tensor Framework for Data Stream Clustering and Compression
    Cyganek, Boguslaw
    Wozniak, Michal
    [J]. IMAGE ANALYSIS AND PROCESSING,(ICIAP 2017), PT I, 2017, 10484 : 163 - 173
  • [23] Functional data clustering: a survey
    Jacques, Julien
    Preda, Cristian
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2014, 8 (03) : 231 - 255
  • [24] Data Stream Clustering Based on Grid Coupling
    Zhang, Dong-Yue
    Zhou, Li-Hua
    Wu, Xiang-Yun
    Zhao, Li-Hong
    [J]. Ruan Jian Xue Bao/Journal of Software, 2019, 30 (03): : 667 - 683
  • [25] Feature-Based Data Stream Clustering
    Asbagh, Mohsen Jafari
    Abolhassani, Hassan
    [J]. PROCEEDINGS OF THE 8TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE, 2009, : 363 - 368
  • [26] An Adaptive Density Data Stream Clustering Algorithm
    Shifei Ding
    Jian Zhang
    Hongjie Jia
    Jun Qian
    [J]. Cognitive Computation, 2016, 8 : 30 - 38
  • [27] Intrusion detection based on clustering a data stream
    Oh, SH
    Kang, JS
    Byun, YC
    Park, GL
    Byun, SY
    [J]. Third ACIS International Conference on Software Engineering Research, Managment and Applications, Proceedings, 2005, : 220 - 227
  • [28] A Novel Algorithm for Adaptive Data Stream Clustering
    Ansarifar, Farnaz
    Ahmadi, Ali
    [J]. 26TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2018), 2018, : 1542 - 1546
  • [29] An Ensemble Learning Approach for Data Stream Clustering
    Fathzadeh, Ramin
    Mokhtari, Vahid
    [J]. 2013 21ST IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2013,
  • [30] Varying density method for data stream clustering
    Mousavi, Maryam
    Khotanlou, Hassan
    Bakar, Azuraliza Abu
    Vakilian, Mohammadmahdi
    [J]. Applied Soft Computing Journal, 2020, 97