Load shedding for window joins on multiple data streams

被引:4
|
作者
Law, Yan-Nei [1 ]
Zaniolo, Carlo [2 ]
机构
[1] Bioinformat Inst, 30 Biopolis St, Singapore 138671, Singapore
[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
关键词
D O I
10.1109/ICDEW.2007.4401054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of semantic load shedding for continuous queries containing window joins on multiple data streams and propose a robust approach that is effective with the different semantic accuracy criteria that are required in different applications. In fact, our approach can be used to (i) maximize the number of output tuples produced by joins, and (ii) optimize the accuracy of complex aggregates estimates under uniform random sampling. We first consider the problem of computing maximal subsets of approximate window joins over multiple data streams. Previously proposed approaches are based on multiple pair-wise joins and, in their load-shedding decisions, disregard the content of streams outside the joined pairs. To overcome these limitations, we optimize our load-shedding policy using various predictors of the productivity of each tuple in the window. To minimize processing costs, we use a fast and-light sketching technique to estimate the productivity of the tuples. We then show that our method can be generalized to produce statistically accurate samples, as needed in, e.g., the computation of averages, quantiles, and stream mining queries. Tests performed on both synthetic and real-life data demonstrate that our method outperforms previous approaches, while requiring comparable amounts of time and space.
引用
下载
收藏
页码:674 / +
页数:2
相关论文
共 50 条
  • [41] Privacy protection on sliding window of data streams
    Wang, Weiping
    Li, Jianzhong
    Ai, Chunyu
    Li, Yingshu
    2007 INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, 2008, : 213 - +
  • [42] Load Shedding Strategy Based on Combined Feed-Forward Plus Feedback Control over Data Streams
    Donghong Han
    Yi Fang
    Daqing Yi
    Yifei Zhang
    Xiang Tang
    Guoren Wang
    Journal of Beijing Institute of Technology, 2019, 28 (03) : 437 - 446
  • [43] Load Shedding Strategy Based on Combined Feed-Forward Plus Feedback Control over Data Streams
    Han D.
    Fang Y.
    Yi D.
    Zhang Y.
    Tang X.
    Wang G.
    Journal of Beijing Institute of Technology (English Edition), 2019, 28 (03): : 437 - 446
  • [44] Approximate data mining for sliding window based data streams
    Yin, Kuo-Cheng
    Hsieh, Yu-Lung
    Yang, Don-Lin
    Journal of Computers, 2012, 23 (02): : 1 - 13
  • [45] PLANNED LOAD SHEDDING AND DATA COLLECTION FOR EMERGENCIES
    JORDAN, JB
    ROBERT, J
    IEEE SPECTRUM, 1966, 3 (05) : 86 - &
  • [46] Clustering Multiple Data Streams
    Balzanella, Antonio
    Lechevallier, Yves
    Verde, Rosanna
    NEW PERSPECTIVES IN STATISTICAL MODELING AND DATA ANALYSIS, 2011, : 247 - 254
  • [47] Sketching distributed sliding-window data streams
    Papapetrou, Odysseas
    Garofalakis, Minos
    Deligiannakis, Antonios
    VLDB JOURNAL, 2015, 24 (03): : 345 - 368
  • [48] Temporal Coalescing on Window Extents over Data Streams
    Al-Kateb, Mohammed
    Kunta, Sasi Sekhar
    Lee, Byung Suk
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (03): : 489 - 503
  • [49] Supporting sliding window queries for continuous data streams
    Qiao, L
    Agrawal, D
    El Abbadi, A
    SSDBM 2002: 15TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2003, : 85 - 94
  • [50] Sketching distributed sliding-window data streams
    Odysseas Papapetrou
    Minos Garofalakis
    Antonios Deligiannakis
    The VLDB Journal, 2015, 24 : 345 - 368