Leveraging Parallel Data Processing Frameworks with Verified Lifting

被引:6
|
作者
Ahmad, Maaz Bin Safeer [1 ]
Cheung, Alvin [1 ]
机构
[1] Univ Washington, Comp Sci & Engn, Seattle, WA 98195 USA
关键词
MAPREDUCE;
D O I
10.4204/EPTCS.229.7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific languages that each framework supports. This rewriting-tedious and error-prone-also requires developers to choose the framework that best optimizes performance given a specific workload. This paper describes CASPER, a novel compiler that automatically retargets sequential Java code for execution on Hadoop, a parallel data processing framework that implements the MapReduce paradigm. Given a sequential code fragment, CASPER uses verified lifting to infer a high-level summary expressed in our program specification language that is then compiled for execution on Hadoop. We demonstrate that CASPER automatically translates Java benchmarks into Hadoop. The translated results execute on average 3:3x faster than the sequential implementations and scale better, as well, to larger datasets.
引用
收藏
页码:67 / 83
页数:17
相关论文
共 50 条
  • [21] A Survey of Distributed Data Stream Processing Frameworks
    Isah, Haruna
    Abughofa, Tariq
    Mahfuz, Sazia
    Ajerla, Dharmitha
    Zulkernine, Farhana
    Khan, Shahzad
    IEEE ACCESS, 2019, 7 : 154300 - 154316
  • [22] Parallel image processing with the block data parallel architecture
    Alexander, WE
    Reeves, DS
    Gloster, CS
    PROCEEDINGS OF THE IEEE, 1996, 84 (07) : 947 - 968
  • [23] Data Dissemination and Parallel Processing Techniques Research Based on Massively Parallel Processing
    Sun, Qiao
    Deng, Bu-qiao
    Nie, Xiab-Bo
    Ma, Hui-yuan
    Sun, Jia-song
    INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATION AND NETWORK ENGINEERING (WCNE 2016), 2016,
  • [24] Parallel Programming Paradigms and Frameworks in Big Data Era
    Ciprian Dobre
    Fatos Xhafa
    International Journal of Parallel Programming, 2014, 42 : 710 - 738
  • [25] Parallel Programming Paradigms and Frameworks in Big Data Era
    Dobre, Ciprian
    Xhafa, Fatos
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2014, 42 (05) : 710 - 738
  • [26] Provisioning Input and Output Data Rates in Data Processing Frameworks
    Nam H. Do
    Tien Van Do
    Lóránt Farkas
    Csaba Rotter
    Journal of Grid Computing, 2020, 18 : 491 - 506
  • [27] Provisioning Input and Output Data Rates in Data Processing Frameworks
    Do, Nam H.
    Van Do, Tien
    Farkas, Lorant
    Rotter, Csaba
    JOURNAL OF GRID COMPUTING, 2020, 18 (03) : 491 - 506
  • [28] Building extensible frameworks for data processing: The case of MDP, Modular toolkit for Data Processing
    Wilbert, Niko
    Zito, Tiziano
    Schuppner, Rike-Benjamin
    Jedrzejewski-Szmek, Zbigniew
    Wiskott, Laurenz
    Berkes, Pietro
    JOURNAL OF COMPUTATIONAL SCIENCE, 2013, 4 (05) : 345 - 351
  • [29] PySchedCL: Leveraging Concurrency in Heterogeneous Data-Parallel Systems
    Ghose, Anirban
    Singh, Siddharth
    Kulaharia, Vivek
    Dokara, Lokesh
    Maity, Srijeeta
    Dey, Soumyajit
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (09) : 2234 - 2247
  • [30] Leveraging Comprehensive Data Analysis to Inform Parallel HPC Workloads
    Dwyer, Matthew
    Kaff, Nicole
    Cohen, Jacob
    Frauenhoffer, Michael
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 3960 - 3967