Random sampling for continuous streams with arbitrary updates

被引:10
|
作者
Tao, Yufei [1 ]
Lian, Xiang
Papadias, Dimitris
Hadjieleftheriou, Marios
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Sha Tin, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Clear Water Bay, Hong Kong, Peoples R China
[3] AT&T Labs, Florham Pk, NJ 07932 USA
关键词
sampling; selectivity estimation;
D O I
10.1109/TKDE.2007.250588
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The existing random sampling methods have at least one of the following disadvantages: they 1) are applicable only to certain update patterns, 2) entail large space overhead, or 3) incur prohibitive maintenance cost. These drawbacks prevent their effective application in stream environments ( where a relation is updated by a large volume of insertions and deletions that may arrive in any order), despite the considerable success of random sampling in conventional databases. Motivated by this, we develop several fully dynamic algorithms for obtaining random samples from individual relations, and from the join result of two tables. Our solutions can handle any update pattern with small space and computational overhead. We also present an in-depth analysis that provides valuable insight into the characteristics of alternative sampling strategies and leads to precision guarantees. Extensive experiments validate our theoretical findings and demonstrate the efficiency of our techniques in practice.
引用
收藏
页码:96 / 110
页数:15
相关论文
共 50 条
  • [21] Recent Results on Processing Random-Order Streams and Space-Efficient Sampling
    McGregor, Andrew
    2008 46TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, VOLS 1-3, 2008, : 206 - 208
  • [22] SAMPLING WITH ARBITRARY CHOICE OF SAMPLING INSTANTS
    TROCH, I
    AUTOMATICA, 1973, 9 (01) : 117 - 124
  • [23] SAMPLING DISTRIBUTION OF THE W-STATISTIC OF DISJUNCTION FOR THE ARBITRARY DIVISION OF A RANDOM RECTANGULAR DISTRIBUTION
    SNEATH, PHA
    JOURNAL OF THE INTERNATIONAL ASSOCIATION FOR MATHEMATICAL GEOLOGY, 1979, 11 (04): : 423 - 429
  • [25] SAGA with Arbitrary Sampling
    Qian, Xun
    Qu, Zheng
    Richarik, Peter
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [26] MATCHING OR RANDOM SAMPLING FOR CASE-CONTROL STUDIES WITH CONTINUOUS COVARIATES
    WACHOLDER, S
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 1985, 122 (03) : 522 - 522
  • [27] Simulating Random Walks in Random Streams
    Kallaugher, John
    Kapralov, Michael
    Price, Eric
    PROCEEDINGS OF THE 2022 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2022, : 3091 - 3126
  • [28] Counting Arbitrary Subgraphs in Data Streams
    Kane, Daniel M.
    Mehlhorn, Kurt
    Sauerwald, Thomas
    Sun, He
    AUTOMATA, LANGUAGES, AND PROGRAMMING, ICALP 2012, PT II, 2012, 7392 : 598 - 609
  • [29] Sampling and Recovery of Pulse Streams
    Hegde, Chinmay
    Baraniuk, Richard G.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2011, 59 (04) : 1505 - 1517
  • [30] SURFACE SAMPLING IN GRAVEL STREAMS
    FRIPP, JB
    DIPLAS, P
    JOURNAL OF HYDRAULIC ENGINEERING-ASCE, 1993, 119 (04): : 473 - 490