Leveraging Parallel Data Processing Frameworks with Verified Lifting

被引:6
|
作者
Ahmad, Maaz Bin Safeer [1 ]
Cheung, Alvin [1 ]
机构
[1] Univ Washington, Comp Sci & Engn, Seattle, WA 98195 USA
关键词
MAPREDUCE;
D O I
10.4204/EPTCS.229.7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific languages that each framework supports. This rewriting-tedious and error-prone-also requires developers to choose the framework that best optimizes performance given a specific workload. This paper describes CASPER, a novel compiler that automatically retargets sequential Java code for execution on Hadoop, a parallel data processing framework that implements the MapReduce paradigm. Given a sequential code fragment, CASPER uses verified lifting to infer a high-level summary expressed in our program specification language that is then compiled for execution on Hadoop. We demonstrate that CASPER automatically translates Java benchmarks into Hadoop. The translated results execute on average 3:3x faster than the sequential implementations and scale better, as well, to larger datasets.
引用
收藏
页码:67 / 83
页数:17
相关论文
共 50 条
  • [31] Parallel processing of range data merging
    Sagawa, R
    Nishino, K
    Wheeler, MD
    Ikeuchi, K
    IROS 2001: PROCEEDINGS OF THE 2001 IEEE/RJS INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4: EXPANDING THE SOCIETAL ROLE OF ROBOTICS IN THE NEXT MILLENNIUM, 2001, : 577 - 583
  • [32] A parallel environment for processing radar data
    Sery, F
    O'Donovan, K
    Pryde, G
    Cook, R
    Horne, A
    SAR IMAGE ANALYSIS, MODELING, AND TECHNIQUES, 1998, 3497 : 13 - 20
  • [33] coreSNP: Parallel Processing of Microarray Data
    Guzzi, Pietro Hiram
    Agapito, Giuseppe
    Cannataro, Mario
    IEEE TRANSACTIONS ON COMPUTERS, 2014, 63 (12) : 2961 - 2974
  • [34] Parallel algorithm in microarray data processing
    Wang, XJ
    Zhou, GH
    Jia, S
    Hessner, MJ
    PDPTA '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-3, 2004, : 43 - 48
  • [35] Algorithmic Aspects of Parallel Data Processing
    Koutris, Paraschos
    Salihoglu, Semih
    Suciu, Dan
    FOUNDATIONS AND TRENDS IN DATABASES, 2016, 8 (04): : 239 - 370
  • [36] Kinetic model of parallel data processing
    Gorbunova, KO
    PARALLEL COMPUTING TECHNOLOGIES, 1999, 1662 : 54 - 59
  • [37] PARALLEL PROCESSING OF MULTICOMPONENT SEISMIC DATA
    Falfushinsky, V. V.
    CYBERNETICS AND SYSTEMS ANALYSIS, 2011, 47 (02) : 330 - 334
  • [38] Parallel processing and analysis of thermographic data
    Shepard, SM
    Wang, D
    Lhota, JR
    Ahmed, T
    Rubadeux, BA
    REVIEW OF PROGRESS IN QUANTITATIVE NONDESTRUCTIVE EVALUATION, VOLS 21A & B, 2002, 615 : 558 - 563
  • [39] Parallel Data Processing with MapReduce: A Survey
    Lee, Kyong-Ha
    Lee, Yoon-Joon
    Choi, Hyunsik
    Chung, Yon Dohn
    Moon, Bongki
    SIGMOD RECORD, 2011, 40 (04) : 11 - 20
  • [40] Autonomic Parallel Data Stream Processing
    De Matteis, Tiziano
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2014, : 995 - 998