A Sampling-based Hybrid Approximate Query Processing System in the Cloud

被引:3
|
作者
Wang, Yuxiang [1 ]
Luo, Junzhou [1 ]
Song, Aibo [1 ]
Dong, Fang [1 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICPP.2014.38
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sampling-based approximate query processing method provides the way, in which the users can save their time and resources for 'Big Data' analytical applications, if the estimated results can satisfy the accuracy expectation earlier before a long wait for the final accurate results. Online aggregation (OLA) is such an attractive technology to respond aggregation queries by calculating approximate results with the confidence interval getting tighter over time. It has been built into the MapReuduce-based cloud system for big data analytics, which allows users to monitor the query progress and save money by killing the computation earlier once sufficient accuracy has been obtained. Unfortunately, there exists a major obstacle that is the estimation failure of OLA affects the OLA performance, which is resulted from the biased sample set that violates the unbiased assumption of OLA sampling. To handle this problem, we first propose a hybrid approximate query processing model to improve the overall OLA performance, where a dynamic scheme switching mechanism is deliberately designed to switch unpromising OLA queries into the bootstrap scheme for further processing, avoiding the whole dataset scanning resulted from the OLA estimation failure. In addition, we also present a progressive estimation method to reduce the false positive ratio of our dynamic scheme switching mechanism. Moreover, we have implemented our hybrid approximate query processing system in Hadoop, and conducted extensive experiments on the TPC-H benchmark for skewed data distribution. Our results demonstrate that our hybrid system can produce acceptable approximate results within a time period one order of magnitude shorter compared to the original OLA over Hadoop.
引用
收藏
页码:291 / 300
页数:10
相关论文
共 50 条
  • [1] Sampling-Based Approximate Skyline Query in Sensor Equipped IoT Networks
    Li, Ji
    Sai, Akshita Maradapu Vera Venkata
    Cheng, Xiuzhen
    Cheng, Wei
    Tian, Zhi
    Li, Yingshu
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2021, 26 (02) : 219 - 229
  • [2] Sampling-Based Approximate Skyline Query in Sensor Equipped IoT Networks
    Ji Li
    Akshita Maradapu Vera Venkata Sai
    Xiuzhen Cheng
    Wei Cheng
    Zhi Tian
    Yingshu Li
    [J]. Tsinghua Science and Technology, 2021, 26 (02) : 219 - 229
  • [3] Opportunistic sampling-based query processing in wireless sensor networks
    Muhammad Umer
    Egemen Tanin
    Lars Kulik
    [J]. GeoInformatica, 2013, 17 : 567 - 597
  • [4] Opportunistic sampling-based query processing in wireless sensor networks
    Umer, Muhammad
    Tanin, Egemen
    Kulik, Lars
    [J]. GEOINFORMATICA, 2013, 17 (04) : 567 - 597
  • [5] Optimized stratified sampling for approximate query processing
    Chaudhuri, Surajit
    Das, Gautam
    Narasayya, Vivek
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2007, 32 (02):
  • [6] Sampling-Based Query Re-Optimization
    Wu, Wentao
    Naughton, Jeffrey F.
    Singh, Harneet
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1721 - 1736
  • [7] A Sampling-Based System for Approximate Big Data Analysis on Computing Clusters
    Salloum, Salman
    Wu, Yinxu
    Huang, Joshua Zhexue
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2481 - 2484
  • [8] Sampling-based Collision Warning System with Smartphone in Cloud Computing Environment
    Tak, S.
    Woo, S.
    Yeo, H.
    [J]. 2015 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2015, : 1181 - 1186
  • [9] Asynchronous Sampling-Based Hybrid Equalizer
    Kocaman, Namik
    Green, Michael M.
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (07) : 1014 - 1025
  • [10] Sampling-based approximate skyline calculation on big data
    Xiao, Xingxing
    Li, Jianzhong
    [J]. DISCRETE MATHEMATICS ALGORITHMS AND APPLICATIONS, 2022, 14 (07)