RHEEM: Enabling Cross-Platform Data Processing

被引:28
|
作者
Agrawal, Divy [2 ]
Chawla, Sanjay [1 ]
Contreras-Rojas, Bertty [1 ]
Elmagarmid, Ahmed [1 ]
Idris, Yasser [1 ]
Kaoudi, Zoi [1 ]
Kruse, Sebastian [3 ]
Lucas, Ji [1 ]
Mansour, Essam [1 ]
Ouzzani, Mourad [1 ]
Papotti, Paolo [1 ,4 ]
Quiane-Ruiz, Jorge-Arnulfo [1 ]
Tang, Nan [1 ]
Thirumuruganathan, Saravanan [1 ]
Troudi, Anis [1 ]
机构
[1] HBKU, Qatar Comp Res Inst, Doha, Qatar
[2] UCSB, Santa Barbara, CA 93106 USA
[3] Hasso Plattner Inst, Potsdam, Germany
[4] Eurecom, Biot, France
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 11卷 / 11期
关键词
EFFICIENT;
D O I
10.14778/3236187.3236195
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Solving business problems increasingly requires going beyond the limits of a single data processing platform (platform for short), such as Hadoop or a DBMS. As a result, organizations typically perform tedious and costly tasks to juggle their code and data across different platforms. Addressing this pain and achieving automatic cross-platform data processing is quite challenging: finding the most efficient platform for a given task requires quite good expertise for all the available platforms. We present RHEEM, a general-purpose cross-platform data processing system that decouples applications from the underlying platforms. It not only determines the best platform to run an incoming task, but also splits the task into subtasks and assigns each subtask to a specific platform to minimize the overall cost (e.g., runtime or monetary cost). It features (i) an interface to easily compose data analytic tasks; (ii) a novel cost-based optimizer able to find the most efficient platform in almost all cases; and (iii) an executor to efficiently orchestrate tasks over different platforms. As a result, it allows users to focus on the business logic of their applications rather than on the mechanics of how to compose and execute them. Using different real-world applications with RHEEM, we demonstrate how cross-platform data processing can accelerate performance by more than one order of magnitude compared to single-platform data processing.
引用
收藏
页码:1414 / 1427
页数:14
相关论文
共 50 条
  • [31] SpinStudioJ: A cross-platform NMR data acquisition and processing workbench based on a plug-in architecture
    Liu, Zao
    Chen, Zhiwei
    MAGNETIC RESONANCE IN CHEMISTRY, 2019, 57 (07) : 380 - 389
  • [32] ChemWriter: Enabling cross-platform mobile chemistry applications through Web standards
    Apodaca, Richard L.
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2011, 242
  • [33] Enabling cross-platform mobile application development: A context-aware middleware
    Achilleos, Achilleas P.
    Kapitsaki, Georgia M.
    1600, Springer Verlag (8787): : 304 - 318
  • [34] Cross-Platform Analysis with Binarized Gene Expression Data
    Tuna, Salih
    Niranjan, Mahesan
    PATTERN RECOGNITION IN BIOINFORMATICS, PROCEEDINGS, 2009, 5780 : 439 - 449
  • [35] A Model for Cross-Platform Searches in Temporal Microarray Data
    Tusch, Guenter
    Tole, Olvi
    Hoinski, Mary Ellen
    ARTIFICIAL INTELLIGENCE IN MEDICINE (AIME 2015), 2015, 9105 : 153 - 158
  • [36] On cross-platform security
    Gong, L
    COMPUTER SYSTEMS: THEORY, TECHNOLOGY AND APPLICATIONS: A TRIBUTE TO ROGER NEEDHAM, 2004, : 89 - 91
  • [37] SERAPH: Enabling Cross-Platform Security Analysis For EVM and WASM Smart Contracts
    Yang, Zhiqiang
    Liu, Han
    Li, Yue
    Zheng, Huixuan
    Wang, Lei
    Chen, Bangdao
    2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2020), 2020, : 21 - 24
  • [38] Cross-platform design
    Bond, T
    DR DOBBS JOURNAL, 1999, 24 (11): : 10 - 10
  • [39] BCL: A Cross-Platform Distributed Data Structures Library
    Brock, Benjamin
    Buluc, Aydin
    Yelick, Katherine
    PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [40] RheemStudio: Cross-Platform Data Analytics Made Easy
    Lucas, Ji
    Idris, Yasser
    Contreras-Rojas, Bertty
    Quiane-Ruiz, Jorge-Arnulfo
    Chawla, Sanjay
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1573 - 1576