Tracking clusters and anomalies in evolving data streams

被引:4
|
作者
Guggilam, Sreelekha [1 ]
Chandola, Varun [1 ,2 ]
Patra, Abani [1 ,3 ]
机构
[1] Univ Buffalo State Univ New York SUNY, Computat Data Sci & Engn, Buffalo, NY 14260 USA
[2] Univ Buffalo State Univ New York SUNY, Comp Sci & Engn, Buffalo, NY USA
[3] Tufts Univ, Data Intens Studies Ctr, Medford, MA 02155 USA
基金
美国国家科学基金会;
关键词
anomaly detection; Bayesian nonparametric models; clustering-based anomaly detection; evolving stream data; extreme value theory; EXTREME-VALUE THEORY; ALGORITHMS;
D O I
10.1002/sam.11552
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data-driven anomaly detection methods typically build a model for the normal behavior of the target system, and score each data instance with respect to this model. A threshold is invariably needed to identify data instances with high (or low) scores as anomalies. This presents a practical limitation on the applicability of such methods, since most methods are sensitive to the choice of the threshold, and it is challenging to set optimal thresholds. The issue is exacerbated in a streaming scenario, where the optimal thresholds vary with time. We present a probabilistic framework to explicitly model the normal and anomalous behaviors and probabilistically reason about the data. An extreme value theory based formulation is proposed to model the anomalous behavior as the extremes of the normal behavior. As a specific instantiation, a joint nonparametric clustering and anomaly detection algorithm (INCAD) is proposed that models the normal behavior as a Dirichlet process mixture model. Results on a variety of datasets, including streaming data, show that the proposed method provides effective and simultaneous clustering and anomaly detection without requiring strong initialization and threshold parameters.
引用
收藏
页码:156 / 178
页数:23
相关论文
共 50 条
  • [31] SPARSE SUBSPACE CLUSTERING FOR EVOLVING DATA STREAMS
    Sui, Jinping
    Liu, Zhen
    Liu, Li
    Jung, Alexander
    Liu, Tianpeng
    Peng, Bo
    Li, Xiang
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7455 - 7459
  • [32] Evolving fuzzy systems for data streams: a survey
    Baruah, Rashmi Dutta
    Angelov, Plamen
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 1 (06) : 461 - 476
  • [33] Feature Drift Detection in Evolving Data Streams
    Zhao, Di
    Koh, Yun Sing
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2020, PT II, 2020, 12392 : 335 - 349
  • [34] K-means for Evolving Data Streams
    Bidaurrazaga, Arkaitz
    Perez, Aritz
    Capo, Marco
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1006 - 1011
  • [35] Logistic regression for evolving data streams classification
    Dept. of Computer Science and Eng., Shanghai Jiaotong Univ., Shanghai 200030, China
    J. Shanghai Jiaotong Univ. Sci., 2007, 2 (197-203):
  • [36] Kalman Filtering for Learning with Evolving Data Streams
    Ziffer, Giacomo
    Bernardo, Alessio
    Della Valle, Emanuele
    Bifet, Albert
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5337 - 5346
  • [37] Mining Evolving Data Streams with Particle Filters
    Fok, Ricky
    An, Aijun
    Wang, Xiaogang
    COMPUTATIONAL INTELLIGENCE, 2017, 33 (02) : 147 - 180
  • [38] Labeling Instances in Evolving Data Streams with MapReduce
    Haque, Ahsanul
    Parker, Brandon
    Khan, Latifur
    2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 387 - 394
  • [39] Detection and classification of changes in evolving data streams
    Gaber, Mohamed Medhat
    Yu, Philip S.
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2006, 5 (04) : 659 - 670
  • [40] Online embedding and clustering of evolving data streams
    Zubaroglu, Alaettin
    Atalay, Volkan
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (01) : 29 - 44