Fast and Accurate Detection of Changes in Data Streams

被引:4
|
作者
Badarna, Murad [1 ]
Wolff, Ran [1 ]
机构
[1] Univ Haifa, Dept Informat Syst, IL-31905 Haifa, Israel
关键词
data stream; change detection; two-sample test; big-data; CONCEPT DRIFT;
D O I
10.1002/sam.11216
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Change detection is one of the most important tasks in time series analysis. When the series is very long, or when it is rapidly updated, it has to be treated as a stream. This means that the change detection algorithm must process each sample in O (1) time and memory. A good algorithm must be generic in terms of the type of changes it can detect. Beyond all, a good algorithm must present a favorable and controlled ratio of the number of samples needed to detect a change to the rate of false positives. We present a change-point detection algorithm called ProTO which dynamically manages a set of candidate change-points whose expected size is a controllable constant. In terms of sample processing, ProTO is comparable with the fastest known algorithm-the Page-Hinkley Test (PHT). Yet, because PHT is limited to just one candidate, ProTO outperforms it in terms of the ratio of the delay to the false positive rate, as well as in terms of robustness. We provide variants of ProTO for detecting changes in the mean or the variance of the stream, and experiment with two realistic applications, as well as with synthetic data. On real problems, ProTO compares favorably with state-of-the-art algorithms implemented in the R-package, which require more than O (1) time per sample. (C) 2014 Wiley Periodicals, Inc.
引用
收藏
页码:125 / 139
页数:15
相关论文
共 50 条
  • [2] Discussion on Fast and Accurate Sketches for Skewed Data Streams: A Case Study
    Sun, Shuhao
    Li, Dagang
    [J]. WEB AND BIG DATA (APWEB-WAIM 2018), PT II, 2018, 10988 : 75 - 89
  • [3] Detection and classification of changes in evolving data streams
    Gaber, Mohamed Medhat
    Yu, Philip S.
    [J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2006, 5 (04) : 659 - 670
  • [4] A Fast and Efficient Local Outlier Detection in Data Streams
    Yang, Xing
    Zhou, Wenli
    Shu, Nanfei
    Zhang, Hao
    [J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 111 - 116
  • [5] DAMP: accurate time series anomaly detection on trillions of datapoints and ultra-fast arriving data streams
    Lu, Yue
    Wu, Renjie
    Mueen, Abdullah
    Zuluaga, Maria A.
    Keogh, Eamonn
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 37 (02) : 627 - 669
  • [6] DAMP: accurate time series anomaly detection on trillions of datapoints and ultra-fast arriving data streams
    Yue Lu
    Renjie Wu
    Abdullah Mueen
    Maria A. Zuluaga
    Eamonn Keogh
    [J]. Data Mining and Knowledge Discovery, 2023, 37 : 627 - 669
  • [7] Learning accurate very fast decision trees from uncertain data streams
    Liang, Chunquan
    Zhang, Yang
    Shi, Peng
    Hu, Zhengguo
    [J]. INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2015, 46 (16) : 3032 - 3050
  • [8] Fast Memory Efficient Local Outlier Detection in Data Streams
    Salehi, Mahsa
    Leckie, Christopher
    Bezdek, James C.
    Vaithianathan, Tharshan
    Zhang, Xuyun
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (12) : 3246 - 3260
  • [9] A Fast and Efficient Algorithm for Outlier Detection Over Data Streams
    Hassaan, Mosab
    Maher, Hend
    Gouda, Karam
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (11) : 749 - 756
  • [10] SimpleLock+: Fast and Accurate Hybrid Data Race Detection
    Yu, Misun
    Bae, Doo-Hwan
    [J]. COMPUTER JOURNAL, 2016, 59 (06): : 793 - 809