Statistical hierarchical clustering algorithm for outlier detection in evolving data streams

被引:20
|
作者
Krleza, Dalibor [1 ]
Vrdoljak, Boris [1 ]
Brcic, Mario [1 ]
机构
[1] Univ Zagreb, Fac Elect Engn & Comp, Unska 3, Zagreb, Croatia
关键词
Big data; Clustering; Anomaly detection; Fraud detection;
D O I
10.1007/s10994-020-05905-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Anomaly detection is a hard data analysis process that requires constant creation and improvement of data analysis algorithms. Using traditional clustering algorithms to analyse data streams is impossible due to processing power and memory issues. To solve this, the traditional clustering algorithm complexity needed to be reduced, which led to the creation of sequential clustering algorithms. The usual approach is two-phase clustering, which usesonlinephase to relax data details and complexity, andofflinephase to cluster concepts created in theonlinephase. Detecting anomalies in a data stream is usually solved in theonlinephase, as it requires unreduced data. Contrarily, producing good macro-clustering is done in theofflinephase, which is the reason why two-phase clustering algorithms have difficulty being equally good in anomaly detection and macro-clustering. In this paper, we propose a statistical hierarchical clustering algorithm equally suitable for both detecting anomalies and macro-clustering. The proposed algorithm is single-phased and uses statistical inference on the input data stream, resulting in statistical distributions that are constantly updated. This makes the classification adaptable, allowing agglomeration of outliers into clusters, tracking population evolution, and to be used without knowing the expected number of clusters and outliers. The proposed algorithm was tested against typical clustering algorithms, including two-phase algorithms suitable for data stream analysis. A number of typical test cases were selected, to show the universality and qualities of the proposed clustering algorithm.
引用
收藏
页码:139 / 184
页数:46
相关论文
共 50 条
  • [1] Statistical hierarchical clustering algorithm for outlier detection in evolving data streams
    Dalibor Krleža
    Boris Vrdoljak
    Mario Brčić
    [J]. Machine Learning, 2021, 110 : 139 - 184
  • [2] A Hybrid Clustering Algorithm for Outlier Detection in Data Streams
    Vijayarani, S.
    Jothi, P.
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (11): : 285 - 295
  • [3] A Framework for Outlier Detection in Evolving Data Streams by Weighting Attributes in Clustering
    Yogita
    Toshniwal, Durga
    [J]. 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING & SECURITY [ICCCS-2012], 2012, 1 : 214 - 222
  • [4] An Outlier Detection Algorithm for Data Streams Based on Fuzzy Clustering
    Su, Xiaoke
    Qin, Yuming
    Wan, Renxia
    [J]. PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, 2008, : 109 - 112
  • [5] Hierarchical clustering for multiple nominal data streams with evolving behaviour
    Sangma, Jerry W.
    Sarkar, Mekhla
    Pal, Vipin
    Agrawal, Amit
    Yogita
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (02) : 1737 - 1761
  • [6] Hierarchical clustering for multiple nominal data streams with evolving behaviour
    Jerry W. Sangma
    Mekhla Sarkar
    Vipin Pal
    Amit Agrawal
    [J]. Complex & Intelligent Systems, 2022, 8 : 1737 - 1761
  • [7] Outlier Detection in Data Streams Using Various Clustering Approaches
    Makkar, Kusum
    Sharma, Meghna
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 690 - 693
  • [8] An Adaptive Clustering Approach for Distributed Outlier Detection in Data Streams
    Della Monaca, Andrea
    Cafaro, Massimo
    Pulimeno, Marco
    Epicoco, Italo
    [J]. 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2023, 583 : 86 - 99
  • [9] Outlier Detection in Data Streams using MCOD Algorithm
    Reddy, S. Vishnu Vardhan
    Harshita, T.
    Akhil, S.
    Ashesh, K.
    [J]. PROCEEDINGS OF THE 2017 3RD INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2017, : 328 - 333
  • [10] An auto-stopped hierarchical clustering algorithm integrating outlier detection algorithm
    Lv, TY
    Su, TX
    Wang, ZX
    Zuo, WL
    [J]. ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2005, 3739 : 464 - 474