Leveraging Parallel Data Processing Frameworks with Verified Lifting

被引:6
|
作者
Ahmad, Maaz Bin Safeer [1 ]
Cheung, Alvin [1 ]
机构
[1] Univ Washington, Comp Sci & Engn, Seattle, WA 98195 USA
关键词
MAPREDUCE;
D O I
10.4204/EPTCS.229.7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific languages that each framework supports. This rewriting-tedious and error-prone-also requires developers to choose the framework that best optimizes performance given a specific workload. This paper describes CASPER, a novel compiler that automatically retargets sequential Java code for execution on Hadoop, a parallel data processing framework that implements the MapReduce paradigm. Given a sequential code fragment, CASPER uses verified lifting to infer a high-level summary expressed in our program specification language that is then compiled for execution on Hadoop. We demonstrate that CASPER automatically translates Java benchmarks into Hadoop. The translated results execute on average 3:3x faster than the sequential implementations and scale better, as well, to larger datasets.
引用
收藏
页码:67 / 83
页数:17
相关论文
共 50 条
  • [1] Optimizing Data Processing Through Verified Lifting
    Ahmad, Maaz Bin Safeer
    ProQuest Dissertations and Theses Global, 2022,
  • [2] Optimizing Data-Intensive Applications Automatically By Leveraging Parallel Data Processing Frameworks
    Ahmad, Maaz Bin Safeer
    Cheung, Alvin
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1675 - 1678
  • [3] Verified Lifting of Stencil Computations
    Kamil, Shoaib
    Cheung, Alvin
    Itzhaky, Shachar
    Solar-Lezama, Armando
    ACM SIGPLAN NOTICES, 2016, 51 (06) : 711 - 726
  • [4] Distributed frameworks and parallel algorithms for processing large-scale geographic data
    Hawick, KA
    Coddington, PD
    James, HA
    PARALLEL COMPUTING, 2003, 29 (10) : 1297 - 1333
  • [5] Private Search Over Big Data Leveraging Distributed File System and Parallel Processing
    Selcuk, Ayse
    Orencik, Cengiz
    Savas, Erkay
    CLOUD COMPUTING 2015: THE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, GRIDS, AND VIRTUALIZATION, 2015, : 116 - 121
  • [6] Advanced Parallel Processing of Lyapunov Exponents Verified by Practical Circuit
    Gotthans, Tomas
    Petrzela, Jiri
    Hrubos, Zdenek
    PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE - RADIOELEKTRONIKA 2011, 2011, : 405 - 408
  • [7] Katara: Synthesizing CRDTs with Verified Lifting
    Laddad, Shadaj
    Power, Conor
    Milano, Mae
    Cheung, Alvin
    Hellerstein, Joseph M.
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2022, 6 (OOPSLA):
  • [8] BWoS: Formally Verified Block-basedWork Stealing for Parallel Processing
    Wang, Jiawei
    Trach, Bohdan
    Fu, Ming
    Behrens, Diogo
    Schwender, Jonathan
    Liu, Yutao
    Lei, Jitang
    Vafeiadis, Viktor
    Haertig, Hermann
    Chen, Haibo
    PROCEEDINGS OF THE 17TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2023, 2023, : 833 - 850
  • [9] Easing the heavy lifting of bulk data processing
    IBM Data Manag. Mag., 2013, 7
  • [10] Parallel Verified Linear System Solver for Uncertain Input Data
    Kolberg, Mariana
    Dorn, Marcio
    Fernandes, Luiz Gustavo
    Bohlender, Gerd
    20TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, PROCEEDINGS, 2008, : 89 - +