Pseudo-random number generation for sketch-based estimations

被引:12
|
作者
Rusu, Florin [1 ]
Dobra, Alin [1 ]
机构
[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2007年 / 32卷 / 02期
关键词
algorithms; experimentation; performance; theory; sketches; data synopses; approximate query processing; fast range-summation;
D O I
10.1145/1242524.1242528
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The exact computation of aggregate queries, like the size of join of two relations, usually requires large amounts of memory (constrained in data-streaming) or communication (constrained in distributed computation) and large processing times. In this situation, approximation techniques with provable guarantees, like sketches, are one possible solution. The performance of sketches depends crucially on the ability to generate particular pseudo-random numbers. In this article we investigate both theoretically and empirically the problem of generating k-wise independent pseudo-random numbers and, in particular, that of generating 3- and 4-wise independent pseudo-random numbers that are fast range-summable (i.e., they can be summed in sublinear time). Our specific contributions are: (a) we provide a thorough comparison of the various pseudo-random number generating schemes; (b) we study both theoretically and empirically the fast range-summation property of 3- and 4-wise independent generating schemes; (c) we provide algorithms for the fast range-summation of two 3-wise independent schemes, BCH and extended Hamming; and (d) we show convincing theoretical and empirical evidence that the extended Hamming scheme performs as well as any 4-wise independent scheme for estimating the size of join of two relations using AMS sketches, even though it is only 3-wise independent. We use this scheme to generate estimators that significantly outperform state-of-the-art solutions for two problems, namely, size of spatial joins and selectivity estimation.
引用
收藏
页数:48
相关论文
共 50 条
  • [1] PSEUDO-RANDOM NUMBER GENERATION AND SPACE COMPLEXITY
    FURST, M
    LIPTON, R
    STOCKMEYER, L
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1983, 158 : 171 - 176
  • [2] Efficient parallel pseudo-random number generation
    Tan, CJK
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 309 - 314
  • [3] Pseudo-random number generation using LSTMs
    Young-Seob Jeong
    Kyo-Joong Oh
    Chung-Ki Cho
    Ho-Jin Choi
    [J]. The Journal of Supercomputing, 2020, 76 : 8324 - 8342
  • [4] Pseudo-random number generation using LSTMs
    Jeong, Young-Seob
    Oh, Kyo-Joong
    Cho, Chung-Ki
    Choi, Ho-Jin
    [J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (10): : 8324 - 8342
  • [5] SOME NEW RESULTS IN PSEUDO-RANDOM NUMBER GENERATION
    VANGELDER, A
    [J]. JOURNAL OF THE ACM, 1967, 14 (04) : 785 - &
  • [6] Evaluation of Pseudo-Random Number Generation on GPU Cards
    Askar, Tair
    Shukirgaliyev, Bekdaulet
    Lukac, Martin
    Abdikamalov, Ernazar
    [J]. COMPUTATION, 2021, 9 (12)
  • [7] Pseudo-Random Number Generation on GP-GPU
    Passerat-Palmbach, Jonathan
    Mazel, Claude
    Hill, David R. C.
    [J]. 2011 IEEE WORKSHOP ON PRINCIPLES OF ADVANCED AND DISTRIBUTED SIMULATION (PADS), 2011,
  • [8] A pseudo-random number generator based on LZSS
    Chang, Weiling
    Fang, Binxing
    Yun, Xiaochun
    Wang, Shupeng
    Yu, Xiangzhan
    [J]. 2010 DATA COMPRESSION CONFERENCE (DCC 2010), 2010, : 524 - 524
  • [9] Pseudo-random number generation based on digit isolation referenced to entropy buffers
    Richardson, Joseph D.
    [J]. SIMULATION-TRANSACTIONS OF THE SOCIETY FOR MODELING AND SIMULATION INTERNATIONAL, 2022, 98 (05): : 389 - 406
  • [10] Hardware acceleration of pseudo-random number generation for simulation applications
    McCollum, JM
    Lancaster, JM
    Bouldin, DW
    Peterson, GD
    [J]. PROCEEDINGS OF THE 35TH SOUTHEASTERN SYMPOSIUM ON SYSTEM THEORY, 2003, : 299 - 303