Order statistics and estimating cardinalities of massive data sets

被引:54
|
作者
Giroire, Frederic [1 ]
机构
[1] INRIA Rocquencourt, ALGO Project, F-78153 Le Chesnay, France
关键词
Cardinality estimates; Algorithm analysis; Very large multisets; Traffic analysis; ACTIVE FLOWS;
D O I
10.1016/j.dam.2008.06.020
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
A new class of algorithms to estimate the cardinality of very large multisets using constant memory and doing only one pass on the data is introduced here. It is based on order statistics rather than on bit patterns in binary representations of numbers. Three families of estimators are analyzed. They attain a standard error of 1/root M using M units of storage. which places them in the same class as the best known algorithms so far. The algorithms have a very simple internal loop, which gives them an advantage in terms of processing speed. For instance, a memory of only 12 kB and only few seconds are sufficient to process a multiset with several million elements and to build an estimate with accuracy of order 2 percent. The algorithms are validated both by mathematical analysis and by experimentations on real internet traffic. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:406 / 427
页数:22
相关论文
共 50 条
  • [1] On the Worst Case Data Sets for Order Statistics
    Wang, Lei
    Wang, Xiaodong
    [J]. APPLIED MATHEMATICS & INFORMATION SCIENCES, 2012, 6 (02): : 357 - 362
  • [2] Massive Data Exploration using Eistimated Cardinalities
    Nerzicl, Pierre
    Smits, Gregory
    Pivert, Olivier
    Lesol, Marie -Jeanne
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2022,
  • [3] Fuzzy sets in data protection: strategies and cardinalities
    Diaz, Irene
    Rodriguez-Muniz, Luis J.
    Troiano, Luigi
    [J]. LOGIC JOURNAL OF THE IGPL, 2012, 20 (04) : 657 - 666
  • [4] SUBSAMPLING WITH K DETERMINANTAL POINT PROCESSES FOR ESTIMATING STATISTICS IN LARGE DATA SETS
    Amblard, Pierre-Olivier
    Barthelme, Simon
    Tremblay, Nicolas
    [J]. 2018 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2018, : 313 - 317
  • [5] CARDINALITIES OF ULTRAPRODUCTS OF FINITE SETS
    KOPPELBERG, S
    [J]. JOURNAL OF SYMBOLIC LOGIC, 1980, 45 (03) : 574 - 584
  • [6] An axiomatic approach to cardinalities of IF sets
    Král, P
    [J]. COMPUTATIONAL INTELLIGENCE, THEORY AND APPLICATIONS, 2005, : 681 - 691
  • [7] Cardinalities of scrambled sets and positive scrambled sets
    Mai, Jiehua
    Zhou, Lei
    Sun, Taixiang
    [J]. TOPOLOGY AND ITS APPLICATIONS, 2024, 342
  • [8] Generalized fuzzy cardinalities of IF sets
    Kral, Pavel
    [J]. Computational Intelligence, Theory and Application, 2006, : 251 - 261
  • [9] Estimating Cardinalities with Deep Sketches
    Kipf, Andreas
    Vorona, Dimitri
    Mueller, Jonas
    Kipf, Thomas
    Radke, Bernhard
    Leis, Viktor
    Boncz, Peter
    Neumann, Thomas
    Kemper, Alfons
    [J]. SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1937 - 1940
  • [10] Probabilities for separating sets of order statistics
    Glueck, D. H.
    Karimpour-Fard, A.
    Mandel, J.
    Muller, K. E.
    [J]. STATISTICS, 2010, 44 (02) : 145 - 153