Maintaining stream statistics over sliding windows

被引:255
|
作者
Datar, M [1 ]
Gionis, A
Indyk, P
Motwani, R
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] MIT, Comp Sci Lab, Cambridge, MA 02139 USA
关键词
statistics; data streams; sliding windows; approximation algorithms;
D O I
10.1137/S0097539701398363
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model. We consider the following basic problem: Given a stream of bits, maintain a count of the number of 1 s in the last N elements seen from the stream. We show that, using O(1/epsilon log(2) N) bits of memory, we can estimate the number of 1 s to within a factor of 1 + epsilon. We also give a matching lower bound of Omega(1/epsilon log(2) N) memory bits for any deterministic or randomized algorithms. We extend our scheme to maintain the sum of the last N positive integers and provide matching upper and lower bounds for this more general problem as well. We also show how to efficiently compute the L-p norms (p is an element of[1, 2]) of vectors in the sliding window model using our techniques. Using our algorithm, one can adapt many other techniques to work for the sliding window model with a multiplicative overhead of O(1/epsilon log N) in memory and a 1 + epsilon factor loss in accuracy. These include maintaining approximate histograms, hash tables, and statistics or aggregates such as sum and averages.
引用
收藏
页码:1794 / 1813
页数:20
相关论文
共 50 条
  • [1] Maintaining stream statistics over multiscale sliding windows
    Jiao, Yishan
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2006, 31 (04): : 1305 - 1334
  • [2] Maintaining Significant Stream Statistics over Sliding Windows
    Lee, L. K.
    Ting, H. F.
    [J]. PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 724 - 732
  • [3] Maintaining stream statistics over sliding windows (extended abstract)
    Datar, M
    Gionis, A
    Indyk, P
    Motwani, R
    [J]. PROCEEDINGS OF THE THIRTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2002, : 635 - 644
  • [4] Clustering on Uncertain Data Stream over Sliding Windows
    Tu, Li
    [J]. 2015 THIRD INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA, 2015, : 148 - 152
  • [5] Moment: Maintaining closed frequent itemsets over a stream sliding window
    Chi, Y
    Wang, HX
    Yu, PS
    Muntz, RR
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 59 - 66
  • [6] Data stream statistics over sliding windows: How to summarize 150 Million updates per second on a single node
    Chrysos, Grigorios
    Papapetrou, Odysseas
    Pnevmatikatos, Dionisios
    Dollas, Apostolos
    Garofalakis, Minos
    [J]. 2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 278 - 285
  • [7] SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows
    Chang, Lei
    Wang, Tengjiao
    Yang, Dongqing
    Luan, Hua
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 83 - +
  • [8] Clustering Algorithm for High Dimensional Data Stream over Sliding Windows
    Liu, Weiguo
    OuYang, Jia
    [J]. TRUSTCOM 2011: 2011 INTERNATIONAL JOINT CONFERENCE OF IEEE TRUSTCOM-11/IEEE ICESS-11/FCST-11, 2011, : 1537 - 1542
  • [9] Mining compressed frequent itemsets over data stream in sliding windows
    Zhao, Li
    Tong, Yongxin
    Yu, Dan
    Ma, Shilong
    Chen, Mengdong
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 1, 2009, : 713 - 717
  • [10] SHE: A Generic Framework for Data Stream Mining over Sliding Windows
    Wu, Yuhan
    Fan, Zhuochen
    Shi, Qilong
    Zhang, Yixin
    Yang, Tong
    Chen, Cheng
    Zhong, Zheng
    Li, Junnan
    Shtul, Ariel
    Tu, Yaofeng
    [J]. 51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,