Dynamic Sketching over Distributed Data Streams

被引:0
|
作者
Wu, Guangjun [1 ]
Jia, Siyu [1 ]
Li, Binbin [1 ]
Wang, Shupeng [1 ]
Bao, Xiuguo [1 ,2 ]
Yuan, Qingsheng [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing 100190, Peoples R China
[2] CNCERT CC, Coordinat Ctr China, Natl Comp Network Emergency Response Tech Team, Beijing 100029, Peoples R China
关键词
approximate answering; big data; data streams; fast data; sketch;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Plentiful emerging applications need strict requirement on query response time for different operators over distributed streaming data. As a result, approximate answering approach with accurate sketch has become an important solution to process the fast arrival streams. In this paper, we pro pose a dynamic sketching framework, which can sample elements from streams with out-of-order data arrival and provide an error-guaranteed estimation schema for many different operators. Within the sketch, we first extract characteristics of uniform sampling and exponential sampling from one-pass streaming data and organize them to support (xi, delta)-approximation for different operators, such as aggregation operators (e.g., sum, count) and quantile operators (e.g., quantiles, median). Moreover, we construct the sketch in an accuracy lossless and dynamic manner by such operations as sketch splitting and sketch merging without any pori knowledge. The experimental results indicate that when compared to big data analytic systems (Spark, BlinkDB), our approach can achieve 3 times of throughput improvement and 2 orders of magnitude improvement in query response time.
引用
收藏
页数:2
相关论文
共 50 条
  • [1] Sketching Linear Classifiers over Data Streams
    Tai, Kai Sheng
    Sharan, Vatsal
    Bailis, Peter
    Valiant, Gregory
    [J]. SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 757 - 772
  • [2] Sketching distributed sliding-window data streams
    Odysseas Papapetrou
    Minos Garofalakis
    Antonios Deligiannakis
    [J]. The VLDB Journal, 2015, 24 : 345 - 368
  • [3] Sketching distributed sliding-window data streams
    Papapetrou, Odysseas
    Garofalakis, Minos
    Deligiannakis, Antonios
    [J]. VLDB JOURNAL, 2015, 24 (03): : 345 - 368
  • [4] Sketching asynchronous data streams over sliding windows
    Bojian Xu
    Srikanta Tirthapura
    Costas Busch
    [J]. Distributed Computing, 2008, 20 : 359 - 374
  • [5] Sketching asynchronous data streams over sliding windows
    Xu, Bojian
    Tirthapura, Srikanta
    Busch, Costas
    [J]. DISTRIBUTED COMPUTING, 2008, 20 (05) : 359 - 374
  • [6] Efficient Matrix Sketching over Distributed Data
    Huang, Zengfeng
    Lin, Xuemin
    Zhang, Wenjie
    Zhang, Ying
    [J]. PODS'17: PROCEEDINGS OF THE 36TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2017, : 347 - 359
  • [7] Sketching Sampled Data Streams
    Rusu, Florin
    Dobra, Alin
    [J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 381 - 392
  • [8] A Sketching Approach for Obtaining Real-Time Statistics Over Data Streams in Cloud
    Wu, Guangjun
    Yun, Xiaochun
    Wang, Yong
    Wang, Shupeng
    Li, Binbin
    Liu, Yong
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (02) : 1462 - 1475
  • [9] Global Iceberg Detection over Distributed Data Streams
    Zhao, Haiquan
    Lall, Ashwin
    Ogihara, Mitsunori
    Xu, Jun
    [J]. 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 557 - 568
  • [10] A Distributed Information Divergence Estimation over Data Streams
    Anceaume, Emmanuelle
    Busnel, Yann
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (02) : 478 - 487