An evolving approach to data streams clustering based on typicality and eccentricity data analytics

被引:37
|
作者
Bezerra, Clauber Gomes [1 ]
Jales Costa, Bruno Sielly [2 ]
Guedes, Luiz Affonso [3 ]
Angelov, Plamen Parvanov [4 ]
机构
[1] Fed Inst Educ Sci & Technol Rio Grande Norte do N, Campus Natal Zona Leste, BR-59015000 Natal, RN, Brazil
[2] Fed Inst Educ Sci & Technol Rio Grande Norte do N, Campus Natal Zona Norte Rua Brusque 2926, BR-59112490 Natal, RN, Brazil
[3] Fed Inst Educ Sci & Technol Rio Grande Norte do N, Dept Comp Engn & Automat, DCA Campus Univ, BR-59078900 Natal, RN, Brazil
[4] Univ Lancaster, Sch Comp & Commun, Data Sci Grp, Lancaster LA1 4WA, England
关键词
Online clustering; Data stream; Eccentricity; Typicality; Anomaly detection; FAULT-DETECTION; FUZZY; IDENTIFICATION; CLASSIFICATION;
D O I
10.1016/j.ins.2019.12.022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose an algorithm for online clustering of data streams. This algorithm is called AutoCloud and is based on the recently introduced concept of Typicality and Eccentricity Data Analytics, mainly used for anomaly detection tasks. AutoCloud is an evolving, online and recursive technique that does not need training or prior knowledge about the data set. Thus, AutoCloud is fully online, requiring no offline processing. It allows creation and merging of clusters autonomously as new data observations become available. The clusters created by AutoCloud are called data clouds, which are structures without pre-defined shape or boundaries. AutoCloud allows each data sample to belong to multiple data clouds simultaneously using fuzzy concepts. AutoCloud is also able to handle concept drift and concept evolution, which are problems that are inherent in data streams in general. Since the algorithm is recursive and online, it is suitable for applications that require a real-time response. We validate our proposal with applications to multiple well known data sets in the literature. (C) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:13 / 28
页数:16
相关论文
共 50 条
  • [21] Hierarchical clustering for multiple nominal data streams with evolving behaviour
    Sangma, Jerry W.
    Sarkar, Mekhla
    Pal, Vipin
    Agrawal, Amit
    Yogita
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (02) : 1737 - 1761
  • [22] S-RASTER: contraction clustering for evolving data streams
    Gregor Ulm
    Simon Smith
    Adrian Nilsson
    Emil Gustavsson
    Mats Jirstrand
    [J]. Journal of Big Data, 7
  • [23] Online Clustering for Evolving Data Streams with Online Anomaly Detection
    Chenaghlou, Milad
    Moshtaghi, Masud
    Leckie, Christopher
    Salehi, Mahsa
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 506 - 519
  • [24] S-RASTER: contraction clustering for evolving data streams
    Ulm, Gregor
    Smith, Simon
    Nilsson, Adrian
    Gustavsson, Emil
    Jirstrand, Mats
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)
  • [25] Hierarchical clustering for multiple nominal data streams with evolving behaviour
    Jerry W. Sangma
    Mekhla Sarkar
    Vipin Pal
    Amit Agrawal
    [J]. Complex & Intelligent Systems, 2022, 8 : 1737 - 1761
  • [26] Distributed weighted clustering of evolving sensor data streams with noise
    Hassani, Marwan
    Seidl, Thomas
    [J]. Journal of Digital Information Management, 2012, 10 (06): : 410 - 420
  • [27] A fuzzy c means variant for clustering evolving data streams
    Hore, Prodip
    Hall, Lawrence O.
    Goldgof, Dmitry B.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 802 - 807
  • [28] A DATA STREAMS CLUSTERING ALGORITHM BASED ON INTERVAL DATA
    Li, Yan
    Ye, Ming
    Wang, Huiwen
    Liu, Dan
    Che, Yin
    [J]. PROCEEDINGS OF THE 38TH INTERNATIONAL CONFERENCE ON COMPUTERS AND INDUSTRIAL ENGINEERING, VOLS 1-3, 2008, : 2775 - 2778
  • [29] Data summarisation by typicality-based clustering for vectorial and non vectorial data
    Lesot, Marie-Jeanne
    Kruse, Rudolf
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2006, : 547 - +
  • [30] A Clustering Approach for Anonymizing Distributed Data Streams
    Mohamed, Mona A.
    Nagi, Magdy H.
    Ghanem, Sahar M.
    [J]. PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 9 - 16