Tracking clusters and anomalies in evolving data streams

被引:4
|
作者
Guggilam, Sreelekha [1 ]
Chandola, Varun [1 ,2 ]
Patra, Abani [1 ,3 ]
机构
[1] Univ Buffalo State Univ New York SUNY, Computat Data Sci & Engn, Buffalo, NY 14260 USA
[2] Univ Buffalo State Univ New York SUNY, Comp Sci & Engn, Buffalo, NY USA
[3] Tufts Univ, Data Intens Studies Ctr, Medford, MA 02155 USA
基金
美国国家科学基金会;
关键词
anomaly detection; Bayesian nonparametric models; clustering-based anomaly detection; evolving stream data; extreme value theory; EXTREME-VALUE THEORY; ALGORITHMS;
D O I
10.1002/sam.11552
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data-driven anomaly detection methods typically build a model for the normal behavior of the target system, and score each data instance with respect to this model. A threshold is invariably needed to identify data instances with high (or low) scores as anomalies. This presents a practical limitation on the applicability of such methods, since most methods are sensitive to the choice of the threshold, and it is challenging to set optimal thresholds. The issue is exacerbated in a streaming scenario, where the optimal thresholds vary with time. We present a probabilistic framework to explicitly model the normal and anomalous behaviors and probabilistically reason about the data. An extreme value theory based formulation is proposed to model the anomalous behavior as the extremes of the normal behavior. As a specific instantiation, a joint nonparametric clustering and anomaly detection algorithm (INCAD) is proposed that models the normal behavior as a Dirichlet process mixture model. Results on a variety of datasets, including streaming data, show that the proposed method provides effective and simultaneous clustering and anomaly detection without requiring strong initialization and threshold parameters.
引用
收藏
页码:156 / 178
页数:23
相关论文
共 50 条
  • [1] Tracking clusters in evolving data streams over sliding windows
    Zhou, Aoying
    Cao, Feng
    Qian, Weining
    Jin, Cheqing
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 15 (02) : 181 - 214
  • [2] Tracking clusters in evolving data streams over sliding windows
    Aoying Zhou
    Feng Cao
    Weining Qian
    Cheqing Jin
    [J]. Knowledge and Information Systems, 2008, 15 : 181 - 214
  • [3] TECNO-STREAMS: Tracking evolving clusters in noisy data streams with a scalable immune system learning model
    Nasraoui, F
    Uribe, CC
    Coronel, CR
    Gonzalez, F
    [J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 235 - 242
  • [4] Robust Clustering for Tracking Noisy Evolving Data Streams
    Nasraoui, Olfa
    Rojas, Carlos
    [J]. PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 619 - 623
  • [5] MovStream: An Efficient Algorithm for Monitoring Clusters Evolving in Data Streams
    Tang, Liang
    Tang, Chang-jie
    Duan, Lei
    Li, Chuan
    Jiang, Ye-xi
    Zeng, Chun-qiu
    Zhu, Jun
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 582 - +
  • [6] Identifying data streams anomalies by evolving spiking restricted Boltzmann machines
    Lining Xing
    Konstantinos Demertzis
    Jinghui Yang
    [J]. Neural Computing and Applications, 2020, 32 : 6699 - 6713
  • [7] Identifying data streams anomalies by evolving spiking restricted Boltzmann machines
    Xing, Lining
    Demertzis, Konstantinos
    Yang, Jinghui
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 6699 - 6713
  • [8] Rapidly Labeling and Tracking Dynamically Evolving Concepts In Data Streams
    Parker, Brandon S.
    Khan, Latifur
    [J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, : 1161 - 1164
  • [9] Fully online clustering of evolving data streams into arbitrarily shaped clusters
    Hyde, Richard
    Angelov, Plamen
    MacKenzie, A. R.
    [J]. INFORMATION SCIENCES, 2017, 382 : 96 - 114
  • [10] Tracking High Quality Clusters over Uncertain Data Streams
    Zhang, Chen
    Gao, Ming
    Zhou, Aoying
    [J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 1641 - +