An adaptive approach for online monitoring of large-scale data streams

被引:0
|
作者
Cao, Shuchen [1 ]
Zhang, Ruizhi [2 ]
机构
[1] Univ Nebraska Lincoln, Dept Stat, Lincoln, NE USA
[2] Univ Georgia, Dept Stat, Athens, GA USA
关键词
False discovery rate; CUSUM; quickest change detection; process control; FALSE DISCOVERY RATE; CHANGE-POINT DETECTION; CHANGEPOINT DETECTION; SCHEMES;
D O I
10.1080/24725854.2023.2281580
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In this article, we propose an adaptive top-r method to monitor large-scale data streams where the change may affect a set of unknown data streams at some unknown time. Motivated by parallel and distributed computing, we propose to develop global monitoring schemes by parallel running local detection procedures and then use the Benjamin-Hochberg false discovery rate control procedure to estimate the number of changed data streams adaptively. Our approach is illustrated in two concrete examples: one is a homogeneous case when all data streams are independent and identically distributed with the same known pre-change and post-change distributions. The other is when all data are normally distributed, and the mean shifts are unknown and can be positive or negative. Theoretically, we show that when the pre-change and post-change distributions are completely specified, our proposed method can estimate the number of changed data streams for both the pre-change and post-change status. Moreover, we perform simulations and two case studies to show its detection efficiency.
引用
收藏
页码:119 / 130
页数:12
相关论文
共 50 条
  • [1] SCALABLE SUM-SHRINKAGE SCHEMES FOR DISTRIBUTED MONITORING LARGE-SCALE DATA STREAMS
    Liu, Kun
    Zhang, Ruizhi
    Mei, Yajun
    STATISTICA SINICA, 2019, 29 (01) : 1 - 22
  • [2] Processing Online News Streams for Large-Scale Semantic Analysis
    Krstajic, Milos
    Mansmann, Florian
    Stoffel, Andreas
    Atkinson, Martin
    Keim, Daniel A.
    2010 IEEE 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDE 2010), 2010, : 215 - 220
  • [3] An effective online data monitoring and saving strategy for large-scale climate simulations
    Xian, Xiaochen
    Archibald, Rick
    Mayer, Benjamin
    Liu, Kaibo
    Li, Jian
    QUALITY TECHNOLOGY AND QUANTITATIVE MANAGEMENT, 2019, 16 (03): : 330 - 346
  • [4] Parallel Strategy for the Large-Scale Data Streams Processing
    Yuan, Ya-Juan
    Ma, Guo-Jie
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND INFORMATION SYSTEMS, 2016, 52 : 232 - 234
  • [5] Robust change detection for large-scale data streams
    Zhang, Ruizhi
    Mei, Yajun
    Shi, Jianjun
    SEQUENTIAL ANALYSIS-DESIGN METHODS AND APPLICATIONS, 2022, 41 (01): : 1 - 19
  • [6] Semantic routing and filtering for large-scale video streams monitoring
    Lin, CY
    Verscheure, O
    Amini, L
    2005 IEEE International Conference on Multimedia and Expo (ICME), Vols 1 and 2, 2005, : 1409 - 1412
  • [7] Jarvis: Large-scale Server Monitoring with Adaptive Near-data Processing
    Sandur, Atul
    Park, ChanHo
    Volos, Stavros
    Agha, Gul
    Jeon, Myeongjae
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 1408 - 1422
  • [8] Online Expansion of Large-scale Data Warehouses
    Cohen, Jeffrey
    Eshleman, John
    Hagenbuch, Brian
    Kent, Joy
    Pedrotti, Christopher
    Sherry, Gavin
    Waas, Florian
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (12): : 1249 - 1259
  • [9] An online classification algorithm for large scale data streams: iGNGSVM
    Suarez-Cetrulo, Andres L.
    Cervantes, Alejandro
    NEUROCOMPUTING, 2017, 262 : 67 - 76
  • [10] Adaptive data reduction for large-scale transaction data
    Li, Xiao-Bai
    Jacob, Varghese S.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2008, 188 (03) : 910 - 924