Load shedding for window joins on multiple data streams

被引:4
|
作者
Law, Yan-Nei [1 ]
Zaniolo, Carlo [2 ]
机构
[1] Bioinformat Inst, 30 Biopolis St, Singapore 138671, Singapore
[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
关键词
D O I
10.1109/ICDEW.2007.4401054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of semantic load shedding for continuous queries containing window joins on multiple data streams and propose a robust approach that is effective with the different semantic accuracy criteria that are required in different applications. In fact, our approach can be used to (i) maximize the number of output tuples produced by joins, and (ii) optimize the accuracy of complex aggregates estimates under uniform random sampling. We first consider the problem of computing maximal subsets of approximate window joins over multiple data streams. Previously proposed approaches are based on multiple pair-wise joins and, in their load-shedding decisions, disregard the content of streams outside the joined pairs. To overcome these limitations, we optimize our load-shedding policy using various predictors of the productivity of each tuple in the window. To minimize processing costs, we use a fast and-light sketching technique to estimate the productivity of the tuples. We then show that our method can be generalized to produce statistically accurate samples, as needed in, e.g., the computation of averages, quantiles, and stream mining queries. Tests performed on both synthetic and real-life data demonstrate that our method outperforms previous approaches, while requiring comparable amounts of time and space.
引用
下载
收藏
页码:674 / +
页数:2
相关论文
共 50 条
  • [21] Semantic load shedding for prioritized continuous queries over data streams
    Park, J
    Cho, H
    COMPUTER AND INFORMATION SCIENCES - ISCIS 2005, PROCEEDINGS, 2005, 3733 : 813 - 822
  • [22] Adaptive load shedding for mining frequent patterns from data streams
    Dang, Xuan Hong
    Ng, Wee-Keong
    Ong, Kok-Leong
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 342 - 351
  • [23] Semantic Load Shedding over Real-Time Data Streams
    Ma, Li
    Zhang, Qiongsheng
    Wang, Kun
    Li, Xin
    Wang, Hongan
    PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN, VOL 1, 2008, : 465 - +
  • [24] LOAD SHEDDING FOR WINDOWED NON-EQUIJOIN OVER SENSOR DATA STREAMS
    Ren, Jiadong
    Huo, Cong
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (05): : 1265 - 1273
  • [25] Towards Efficient KNN Joins on Data Streams
    Yang, Chong
    Yu, Xiaohui
    Liu, Yang
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 782 - 783
  • [26] Window-based multiple continuous query algorithm for data streams
    Liu, Wen
    Zhang, Tuqian
    Liu, Junxia
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (09): : 5782 - 5807
  • [27] Simultaneous sliding window join approach over multiple data streams
    Qian, Jiangbo
    Xu, Hongbing
    Wang, Yongli
    Liu, Xuejun
    Dong, Yisheng
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2005, 42 (10): : 1771 - 1778
  • [28] Window-based multiple continuous query algorithm for data streams
    Wen Liu
    Tuqian Zhang
    Junxia Liu
    The Journal of Supercomputing, 2019, 75 : 5782 - 5807
  • [29] Concept-Driven Load Shedding: Reducing Size and Error of Voluminous and Variable Data Streams
    Katsipoulakis, Nikos R.
    Labrinidis, Alexandros
    Chrysanthis, Panos K.
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 418 - 427
  • [30] ClusterSheddy:: Load shedding using moving clusters over spatio-temporal data streams
    Nehme, Rimma V.
    Rundensteiner, Elke A.
    ADVANCES IN DATABASES: CONCEPTS, SYSTEMS AND APPLICATIONS, 2007, 4443 : 637 - +