IncApprox: A Data Analytics System for Incremental Approximate Computing

被引:32
|
作者
Krishnan, Dhanya R. [1 ]
Do Le Quoc [1 ]
Bhatotia, Pramod [1 ]
Fetzer, Christof [1 ]
Rodrigues, Rodrigo [2 ,3 ]
机构
[1] Tech Univ Dresden, Dresden, Germany
[2] Univ Lisbon, IST, Lisbon, Portugal
[3] INESC ID, Lisbon, Portugal
关键词
D O I
10.1145/2872427.2883026
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Incremental and approximate computations are increasingly being adopted for data analytics to achieve low-latency execution and efficient utilization of computing resources. Incremental computation updates the output incrementally instead of re-computing everything from scratch for successive runs of a job with input changes. Approximate computation returns an approximate output for a job instead of the exact output. Both paradigms rely on computing over a subset of data items instead of computing over the entire dataset, but they differ in their means for skipping parts of the computation. Incremental computing relies on the memoization of intermediate results of sub-computations, and reusing these memoized results across jobs. Approximate computing relies on representative sampling of the entire dataset to compute over a subset of data items. In this paper, we observe that these two paradigms are complementary, and can be married together! Our idea is quite simple: design a sampling algorithm that biases the sample selection to the memoized data items from previous runs. To realize this idea, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error. We implemented our algorithm in a data analytics system called INcAPPRox based on Apache Spark Streaming. Our evaluation using micro-benchmarks and real world case-studies shows that INcAPPRox achieves the benefits of both incremental and approximate computing.
引用
收藏
页码:1133 / 1144
页数:12
相关论文
共 50 条
  • [1] STREAMAPPROX: Approximate Computing for Stream Analytics
    Do Le Quoc
    Chen, Ruichuan
    Bhatotia, Pramod
    Fetzer, Christof
    Hilt, Volker
    Strufe, Thorsten
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL MIDDLEWARE CONFERENCE (MIDDLEWARE'17), 2017, : 185 - 197
  • [2] ApproxIoT: Approximate Analytics for Edge Computing
    Wen, Zhenyu
    Do Le Quoc
    Bhatotia, Pramod
    Chen, Ruichuan
    Lee, Myungjin
    [J]. 2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 411 - 421
  • [3] Approximate Computation for Big Data Analytics
    Ma, Shuai
    [J]. DATABASES THEORY AND APPLICATIONS, ADC 2018, 2018, 10837 : XVIII - XVIII
  • [4] Incremental computing with data structures
    Morihata, Akimasa
    [J]. SCIENCE OF COMPUTER PROGRAMMING, 2018, 164 : 18 - 36
  • [5] Incremental and Parallel Analytics on Astrophysical Data Streams
    Mishin, Dmitryz
    Budavari, Tamas
    Szalay, Alexander
    Ahmad, Yanif
    [J]. 2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1078 - 1086
  • [6] Incremental Partitioning for Efficient Spatial Data Analytics
    Vu, Tin
    Eldawy, Ahmed
    Hristidis, Vagelis
    Tsotras, Vassilis
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 15 (03): : 713 - 726
  • [7] Efficient Incremental Data Analytics with Apache Spark
    Gholamian, Sina
    Golab, Wojciech
    Ward, Paul A. S.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2859 - 2868
  • [8] Health Monitoring System by Prognotive Computing using Big Data Analytics
    Srivathsan, M.
    Arjun, Yogesh K.
    [J]. BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 : 602 - 609
  • [9] APPROXIMATE COMPUTING BASED LOW-POWER FPGA DESIGN FOR BIG DATA ANALYTICS IN CLOUD ENVIRONMENTS
    Dova, Murali
    Sandi, Anuradha M
    [J]. Scalable Computing, 2024, 25 (04): : 3152 - 3162
  • [10] Data analytics and cloud computing technologies
    [J]. Hart's E and P, 2021, 96 (04): : 48 - 49