Stream Aggregation with Compressed Sliding Windows

被引:2
|
作者
Geethakumari, Prajith Ramakrishnan [1 ]
Sourdis, Ioannis [1 ]
机构
[1] Chalmers Univ Technol, Comp Sci & Engn Dept, Rannvagen 6, S-41296 Gothenburg, Sweden
基金
瑞典研究理事会;
关键词
Compression; dataflow; aggregation; sliding windows; stream processing; SYSTEM;
D O I
10.1145/3590774
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
High performance stream aggregation is critical for many emerging applications that analyze massive volumes of data. Incoming data needs to be stored in a sliding window during processing, in case the aggregation functions cannot be computed incrementally. Updating the window with new incoming values and reading it to feed the aggregation functions are the two primary steps in stream aggregation. Although window updates can be supported efficiently using multi-level queues, frequent window aggregations remain a performance bottleneck as they put tremendous pressure on the memory bandwidth and capacity. This article addresses this problem by enhancing StreamZip, a dataflow stream aggregation engine that is able to compress the sliding windows. StreamZip deals with a number of data and control dependency challenges to integrate a compressor in the stream aggregation pipeline and alleviate the memory pressure posed by frequent aggregations. In addition, StreamZip incorporates a caching mechanism for dealing with skewed-key distributions in the incoming data stream. In doing so, StreamZip offers higher throughput as well as larger effective window capacity to support larger problems. StreamZip supports diverse compression algorithms offering both lossless and lossy compression to integers as well as floating-point numbers. Compared to designs without compression, StreamZip lossless and lossy designs achieve up to 7.5x and 22x higher throughput, while improving the effective memory capacity by up to 5x and 23x, respectively.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] StreamZip: Compressed Sliding-Windows for Stream Aggregation
    Geethakumari, Prajith Ramakrishnan
    Sourdis, Ioannis
    [J]. 2021 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT), 2021, : 203 - 211
  • [2] Mining compressed frequent itemsets over data stream in sliding windows
    Zhao, Li
    Tong, Yongxin
    Yu, Dan
    Ma, Shilong
    Chen, Mengdong
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 1, 2009, : 713 - 717
  • [3] Maintaining stream statistics over sliding windows
    Datar, M
    Gionis, A
    Indyk, P
    Motwani, R
    [J]. SIAM JOURNAL ON COMPUTING, 2002, 31 (06) : 1794 - 1813
  • [4] Maintaining stream statistics over multiscale sliding windows
    Jiao, Yishan
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2006, 31 (04): : 1305 - 1334
  • [5] Subsuming Multiple Sliding Windows for Shared Stream Computation
    Patroumpas, Kostas
    Sellis, Timos
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 2011, 6909 : 56 - 69
  • [6] Clustering on Uncertain Data Stream over Sliding Windows
    Tu, Li
    [J]. 2015 THIRD INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA, 2015, : 148 - 152
  • [7] Maintaining Significant Stream Statistics over Sliding Windows
    Lee, L. K.
    Ting, H. F.
    [J]. PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 724 - 732
  • [8] Data stream treatment using sliding windows with MapReduce
    Jose Basgall, Maria
    Hasperue, Waldo
    Naiouf, Marcelo
    [J]. JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2016, 16 (02): : 76 - 83
  • [9] A Sketch Framework for Approximate Data Stream Processing in Sliding Windows
    Gou, Xiangyang
    Zhang, Yinda
    Hu, Zhoujing
    He, Long
    Wang, Ke
    Liu, Xilai
    Yang, Tong
    Wang, Yi
    Cui, Bin
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 4411 - 4424
  • [10] Maintaining stream statistics over sliding windows (extended abstract)
    Datar, M
    Gionis, A
    Indyk, P
    Motwani, R
    [J]. PROCEEDINGS OF THE THIRTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2002, : 635 - 644