Leveraging Parallel Data Processing Frameworks with Verified Lifting

被引:6
|
作者
Ahmad, Maaz Bin Safeer [1 ]
Cheung, Alvin [1 ]
机构
[1] Univ Washington, Comp Sci & Engn, Seattle, WA 98195 USA
关键词
MAPREDUCE;
D O I
10.4204/EPTCS.229.7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific languages that each framework supports. This rewriting-tedious and error-prone-also requires developers to choose the framework that best optimizes performance given a specific workload. This paper describes CASPER, a novel compiler that automatically retargets sequential Java code for execution on Hadoop, a parallel data processing framework that implements the MapReduce paradigm. Given a sequential code fragment, CASPER uses verified lifting to infer a high-level summary expressed in our program specification language that is then compiled for execution on Hadoop. We demonstrate that CASPER automatically translates Java benchmarks into Hadoop. The translated results execute on average 3:3x faster than the sequential implementations and scale better, as well, to larger datasets.
引用
收藏
页码:67 / 83
页数:17
相关论文
共 50 条
  • [41] Parallel processing of continuous data streams
    Buza, A
    INES 2005: 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENGINEERING SYSTEMS, 2005, : 225 - 227
  • [42] Pipeline synchronization in data parallel processing
    Zhang, GS
    Li, XM
    CHINESE SCIENCE BULLETIN, 1996, 41 (02): : 163 - 168
  • [43] Pipeline synchronization in data parallel processing
    张冠松
    李晓明
    Chinese Science Bulletin, 1996, (02) : 163 - 168
  • [44] Data Processing Algorithm for Parallel Computing
    Barabanov, Igor
    Barabanova, Elizaveta
    Maltseva, Natalia
    Kvyatkovskaya, Irina
    KNOWLEDGE-BASED SOFTWARE ENGINEERING, JCKBSE 2014, 2014, 466 : 61 - 69
  • [45] Parallel Decompression of Seismic Data on GPU Using a Lifting Wavelet Algorithm
    Castelar, Jairo A.
    Angulo, Carlos A.
    Fajardo, Carlos A.
    2015 20TH SYMPOSIUM ON SIGNAL PROCESSING, IMAGES AND COMPUTER VISION (STSIVA), 2015,
  • [46] Parallel Data Processing for Sparse Data Tomography Sensors
    Ceballos, Cantoral J. A.
    Ozanyan, K. B.
    2011 IEEE SENSORS, 2011, : 1656 - 1660
  • [47] Parallel processing of layout data with selective data distribution
    Pereira, Mark
    Bhat, Nitin
    Srinivas, Preethi
    PHOTOMASK TECHNOLOGY 2006, PTS 1 AND 2, 2006, 6349
  • [48] Data-Flow Awareness in Parallel Data Processing
    Bednarek, David
    Dokulil, Jiri
    Yaghob, Jakub
    Zavoral, Filip
    INTELLIGENT DISTRIBUTED COMPUTING VI, 2013, 446 : 149 - 154
  • [49] Optimize Parallel Data Access in Big Data Processing
    Yin, Jiangling
    Wang, Jun
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 721 - 724
  • [50] Resource Management for Parallel Processing Frameworks with Load Awareness at Worker Side
    Ha, Son-Hai
    Brown, Patrick
    Michiardi, Pietro
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 161 - 168