Random sampling algorithms for sliding windows over data streams

被引:3
|
作者
Zhang, LB [1 ]
Li, ZH [1 ]
Yu, M [1 ]
Wang, Y [1 ]
Jiang, Y [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shannxi, Peoples R China
关键词
data streams; random sampling; sliding window; approximate algorithm;
D O I
10.1142/9789812701534_0129
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are growing interests in algorithms over data streams recently. This paper introduces the problem of sampling from sliding windows of recent data items from data streams and presents two random sampling algorithms for this problem. The first algorithm is a basic window-based sampling algorithm (BWRS Algorithm) for count-based sliding window. BWRS algorithm extends classic reservoir sampling to deal with the expiration of data elements from count-based sliding window, and can avoid drawbacks of classic reservoir sampling. The second algorithm is a stratified multistage sampling algorithm for time-based sliding window (SMS Algorithm). The SMS algorithm takes different sampling fraction in different strata from time-based sliding window, and works even when the number of data items in the sliding window varies dynamically over time. The theoretic analysis and experiments show that the algorithms are effective and efficient for continuous data streams processing.
引用
收藏
页码:572 / 575
页数:4
相关论文
共 50 条
  • [41] Truly Perfect Samplers for Data Streams and Sliding Windows
    Jayaram, Rajesh
    Woodruff, David P.
    Zhou, Samson
    [J]. PROCEEDINGS OF THE 41ST ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS (PODS '22), 2022, : 29 - 40
  • [42] Mining frequent itemsets over data streams with multiple time-sensitive sliding windows
    Jin, Long
    Chai, Duck Jin
    Lee, Yang Koo
    Ryu, Keun Ho
    [J]. ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, 2007, : 486 - +
  • [43] CVS: Fast cardinality estimation for large-scale data streams over sliding windows
    Shan, Jingsong
    Luo, Jianxin
    Ni, Guiqiang
    Wu, Zhaofeng
    Duan, Weiwei
    [J]. NEUROCOMPUTING, 2016, 194 : 107 - 116
  • [44] Find recent frequent items with sliding windows in data streams
    Ren, Jiadong
    Li, Ke
    [J]. 2007 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL II, PROCEEDINGS, 2007, : 625 - 628
  • [45] Partition-Based Clustering with Sliding Windows for Data Streams
    Youn, Jonghem
    Choi, Jihun
    Shim, Junho
    Lee, Sang-goo
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), PT II, 2017, 10178 : 289 - 303
  • [46] A basic-window based priority-sample algorithm for sliding windows over data streams
    Zhang, Longbo
    Li, Zhanhuai
    Yu, Min
    Jiang, Yun
    [J]. 2007 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE & TECHNOLOGY, PROCEEDINGS, 2007, : 316 - 319
  • [47] An EM-Based Algorithm for Clustering Data Streams in Sliding Windows
    Dang, Xuan Hong
    Lee, Vincent
    Ng, Wee Keong
    Ciptadi, Arridhang
    Ong, Kok Leong
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2009, 5463 : 230 - +
  • [48] RLC: ranking lag correlations with flexible sliding windows in data streams
    Wu, Shanshan
    Lin, Huaizhong
    Wang, Wenxiang
    Lu, Dongming
    U, Leong Hou
    Gao, Yunjun
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2017, 20 (02) : 601 - 611
  • [49] RLC: ranking lag correlations with flexible sliding windows in data streams
    Shanshan Wu
    Huaizhong Lin
    Wenxiang Wang
    Dongming Lu
    Leong Hou U
    Yunjun Gao
    [J]. Pattern Analysis and Applications, 2017, 20 : 601 - 611
  • [50] STAGGER: Periodicity mining of data streams using expanding sliding windows
    Elfeky, Mohamed G.
    Aref, Walid G.
    Elmagarmid, Ahmed K.
    [J]. ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 188 - +