Random sampling for continuous streams with arbitrary updates

被引:10
|
作者
Tao, Yufei [1 ]
Lian, Xiang
Papadias, Dimitris
Hadjieleftheriou, Marios
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Sha Tin, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Clear Water Bay, Hong Kong, Peoples R China
[3] AT&T Labs, Florham Pk, NJ 07932 USA
关键词
sampling; selectivity estimation;
D O I
10.1109/TKDE.2007.250588
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The existing random sampling methods have at least one of the following disadvantages: they 1) are applicable only to certain update patterns, 2) entail large space overhead, or 3) incur prohibitive maintenance cost. These drawbacks prevent their effective application in stream environments ( where a relation is updated by a large volume of insertions and deletions that may arrive in any order), despite the considerable success of random sampling in conventional databases. Motivated by this, we develop several fully dynamic algorithms for obtaining random samples from individual relations, and from the join result of two tables. Our solutions can handle any update pattern with small space and computational overhead. We also present an in-depth analysis that provides valuable insight into the characteristics of alternative sampling strategies and leads to precision guarantees. Extensive experiments validate our theoretical findings and demonstrate the efficiency of our techniques in practice.
引用
收藏
页码:96 / 110
页数:15
相关论文
共 50 条
  • [1] SAMPLING OF RANDOM DATA STREAMS
    Cepciansky, Gustav
    Schwartz, Ladislav
    ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING, 2011, 9 (01) : 1 - 6
  • [2] File Updates Under Random/Arbitrary Insertions And Deletions
    Wang, Qiwen
    Cadambe, Viveck
    Jaggi, Sidharth
    Schwartz, Moshe
    Medard, Muriel
    2015 IEEE INFORMATION THEORY WORKSHOP (ITW), 2015,
  • [3] Continuous Sampling from Distributed Streams
    Cormode, Graham
    Muthukrishnan, S.
    Yi, Ke
    Zhang, Qin
    JOURNAL OF THE ACM, 2012, 59 (02)
  • [4] File Updates Under Random/Arbitrary Insertions and Deletions
    Wang, Qiwen
    Jaggi, Sidharth
    Medard, Muriel
    Cadambe, Viveck R.
    Schwartz, Moshe
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2017, 63 (10) : 6487 - 6513
  • [5] Optimal Random Sampling from Distributed Streams Revisited
    Tirthapura, Srikanta
    Woodruff, David P.
    DISTRIBUTED COMPUTING, 2011, 6950 : 283 - +
  • [6] AB-tree: Index for Concurrent Random Sampling and Updates
    Zhao, Zhuoyue
    Xie, Dong
    Li, Feifei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (09): : 1835 - 1847
  • [7] Incremental updates of closed frequent itemsets over continuous data streams
    Li, Hlia-Fu
    Ho, Chin-Chuan
    Lee, Suh-Yin
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 2451 - 2458
  • [8] Random sampling algorithms for sliding windows over data streams
    Zhang, LB
    Li, ZH
    Yu, M
    Wang, Y
    Jiang, Y
    PROCEEDINGS OF THE 11TH JOINT INTERNATIONAL COMPUTER CONFERENCE, 2005, : 572 - 575
  • [9] Random sampling algorithms for landmark windows over data streams
    Zhang Longbo
    Li Zhanhuai
    Yu Min
    Wang Yong
    Jiang Yun
    ICEIS 2006: PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATIONAL SYSTEMS: DATABASES AND INFORMATION SYSTEMS INTEGRATION, 2006, : 103 - +
  • [10] Boosting distinct random sampling for basic counting on the union of distributed streams
    Xu, Bojian
    THEORETICAL COMPUTER SCIENCE, 2015, 602 : 60 - 79