Panakos: Chasing the Tails for Multidimensional Data Streams

被引:2
|
作者
Zhao, Fuheng [1 ]
Khan, Punnal Ismail [1 ]
Agrawal, Divyakant [1 ]
El Abbadi, Amr [1 ]
Gupta, Arpit [1 ]
Liu, Zaoxing [2 ]
机构
[1] UC Santa Barbara, Santa Barbara, CA 93106 USA
[2] Boston Univ, Boston, MA 02215 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2023年 / 16卷 / 06期
关键词
D O I
10.14778/3583140.3583147
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
System operators are often interested in extracting different feature streams from multi-dimensional data streams; and reporting their distributions at regular intervals, including the heavy hitters that contribute to the tail portion of the feature distribution. Satisfying these requirements to increase data rates with limited resources is challenging. This paper presents the design and implementation of Panakos that makes the best use of available resources to report a given feature's distribution accurately, its tail contributors, and other stream statistics (e.g., cardinality, entropy, etc.). Our key idea is to leverage the skewness inherent to most feature streams in the real world. We leverage this skewness by disentangling the feature stream into hot, warm, and cold items based on their feature values. We then use different data structures for tracking objects in each category. Panakos provides solid theoretical guarantees and achieves high performance for various tasks. We have implemented Panakos on both software and hardware and compared Panakos to other state-of-the-art sketches using synthetic and real-world datasets. The experimental results demonstrate that Panakos often achieves one order of magnitude better accuracy than the state-of-the-art solutions for a given memory budget.
引用
收藏
页码:1291 / 1304
页数:14
相关论文
共 50 条
  • [1] Chasing Our Tails
    Saultz, John
    FAMILY MEDICINE, 2018, 50 (03) : 173 - 175
  • [2] Systolic opportunities for multidimensional data streams
    Chai, SM
    Wills, DS
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2002, 13 (04) : 388 - 398
  • [3] Bacteriophages with tails: chasing their origins and evolution
    Hendrix, RW
    Hatfull, GF
    Smith, MCM
    RESEARCH IN MICROBIOLOGY, 2003, 154 (04) : 253 - 257
  • [4] Adaptive spatial partitioning for multidimensional data streams
    Hershberger, John
    Shrivastava, Nisheeth
    Suri, Subhash
    Toth, Csaba D.
    ALGORITHMICA, 2006, 46 (01) : 97 - 117
  • [5] Adaptive spatial partitioning for multidimensional data streams
    Hershberger, J
    Shrivastava, N
    Suri, S
    Tóth, CD
    ALGORITHMS AND COMPUTATION, 2004, 3341 : 522 - 533
  • [6] Range counting over multidimensional data streams
    Suri, Subhash
    Toth, Csaba D.
    Zhou, Yunhong
    DISCRETE & COMPUTATIONAL GEOMETRY, 2006, 36 (04) : 633 - 655
  • [7] MULTIDIMENSIONAL STREAMS ROOTED IN DATA-FLOW
    LEE, EA
    IFIP TRANSACTIONS A-COMPUTER SCIENCE AND TECHNOLOGY, 1993, 23 : 295 - 306
  • [8] CAMS: OLAPing Multidimensional Data Streams Efficiently
    Cuzzocrea, Alfredo
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2009, 5691 : 48 - 62
  • [9] Range Counting over Multidimensional Data Streams
    Subhash Suri
    Csaba D. Toth
    Yunhong Zhou
    Discrete & Computational Geometry, 2006, 36 : 633 - 655
  • [10] Adaptive Spatial Partitioning for Multidimensional Data Streams
    John Hershberger
    Nisheeth Shrivastava
    Subhash Suri
    Csaba D. Toth
    Algorithmica, 2006, 46 : 97 - 117