Supporting Real-Time Analytic Queries in Big and Fast Data Environments

被引:4
|
作者
Wu, Guangjun [1 ]
Yun, Xiaochun [1 ]
Li, Chao [2 ]
Wang, Shupeng [1 ]
Wang, Yipeng [1 ]
Zhang, Xiaoyu [1 ]
Jia, Siyu [1 ]
Zhang, Guangyan [3 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing 100029, Peoples R China
[2] Natl Comp Network & Informat Secur Adm Ctr, Beijing 100031, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
关键词
Approximate answering; Big data; Data streams; Distributed computing; Sampling;
D O I
10.1007/978-3-319-55699-4_29
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently there has been a significant interest to perform real-time analytical queries in systems that can handle both "big data" and "fast data". In this paper, we propose an approximate answering approach, called ROSE, which can manage the big and fast data streams and support complex analytical queries against the data streams. To achieve this goal, we start with an analysis of existing query processing techniques in big data systems to understand the requirements of building a distributed analytic sketch. We then propose a sampling-based sketch that can extract multi-faced samples from asynchronous data streams, and augment its usability with accuracy-lossless distributed sketch construction operations, such as splitting, merging and union. The experimental results with real-world data sets indicate that compared with state-of-the-art approximate answering engine BlinkDB, our techniques can obtain more accurate estimates and improve 2 times of system throughput. When compared with distributed memory-computing system Spark, our system can achieve 2 orders of magnitude improvement on query response time.
引用
收藏
页码:477 / 493
页数:17
相关论文
共 50 条
  • [1] A Data Structure for Real-Time Aggregation Queries of Big Brain Networks
    Ganglberger, Florian Johann
    Kaczanowska, Joanna
    Haubensak, Wulf
    Buehler, Katja
    [J]. NEUROINFORMATICS, 2020, 18 (01) : 131 - 149
  • [2] A Data Structure for Real-Time Aggregation Queries of Big Brain Networks
    Florian Johann Ganglberger
    Joanna Kaczanowska
    Wulf Haubensak
    Katja Bühler
    [J]. Neuroinformatics, 2020, 18 : 131 - 149
  • [3] A Comparative Performance of Real-time Big Data Analytic Architectures
    Sanla, Apisit
    Numnonda, Thanisa
    [J]. PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 674 - 678
  • [4] Scheduling Periodic Continuous Queries in Real-Time Data Broadcast Environments
    Wang, Hongya
    Xiao, Yingyuan
    Shu, LihChyun
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2012, 61 (09) : 1325 - 1340
  • [5] QoS management of real-time data stream queries in distributed environments
    Wei, Yuan
    Prasad, Vibha
    Son, Sang H.
    [J]. 10TH IEEE INTERNATIONAL SYMPOSIUM ON OBJECT AND COMPONENT-ORIENTED REAL-TIME DISTRIBUTED COMPUTING, PROCEEDINGS, 2007, : 241 - +
  • [6] Towards Real-time Collaborative Filtering for Big Fast Data
    Diaz-Aviles, Ernesto
    Nejdl, Wolfgang
    Drumond, Lucas
    Schmidt-Thieme, Lars
    [J]. PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 779 - 780
  • [7] Real-time supporting environments for multimedia networking
    Chung, MS
    Sato, F
    Miyagishi, O
    [J]. IEICE TRANSACTIONS ON COMMUNICATIONS, 1997, E80B (01) : 182 - 186
  • [8] Real-Time Data ETL Framework for Big Real-Time Data Analysis
    Li, Xiaofang
    Mao, Yingchi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 1289 - 1294
  • [9] Real-time resource scaling platform for Big Data workloads on serverless environments
    Enes, Jonatan
    Expósito, Roberto R.
    Touriño, Juan
    [J]. Future Generation Computer Systems, 2020, 105 : 361 - 379
  • [10] Real-time resource scaling platform for Big Data workloads on serverless environments
    Enes, Jonatan
    Exposito, Roberto R.
    Tourino, Juan
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 105 : 361 - 379