RHEEM: Enabling Cross-Platform Data Processing

被引:28
|
作者
Agrawal, Divy [2 ]
Chawla, Sanjay [1 ]
Contreras-Rojas, Bertty [1 ]
Elmagarmid, Ahmed [1 ]
Idris, Yasser [1 ]
Kaoudi, Zoi [1 ]
Kruse, Sebastian [3 ]
Lucas, Ji [1 ]
Mansour, Essam [1 ]
Ouzzani, Mourad [1 ]
Papotti, Paolo [1 ,4 ]
Quiane-Ruiz, Jorge-Arnulfo [1 ]
Tang, Nan [1 ]
Thirumuruganathan, Saravanan [1 ]
Troudi, Anis [1 ]
机构
[1] HBKU, Qatar Comp Res Inst, Doha, Qatar
[2] UCSB, Santa Barbara, CA 93106 USA
[3] Hasso Plattner Inst, Potsdam, Germany
[4] Eurecom, Biot, France
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 11卷 / 11期
关键词
EFFICIENT;
D O I
10.14778/3236187.3236195
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Solving business problems increasingly requires going beyond the limits of a single data processing platform (platform for short), such as Hadoop or a DBMS. As a result, organizations typically perform tedious and costly tasks to juggle their code and data across different platforms. Addressing this pain and achieving automatic cross-platform data processing is quite challenging: finding the most efficient platform for a given task requires quite good expertise for all the available platforms. We present RHEEM, a general-purpose cross-platform data processing system that decouples applications from the underlying platforms. It not only determines the best platform to run an incoming task, but also splits the task into subtasks and assigns each subtask to a specific platform to minimize the overall cost (e.g., runtime or monetary cost). It features (i) an interface to easily compose data analytic tasks; (ii) a novel cost-based optimizer able to find the most efficient platform in almost all cases; and (iii) an executor to efficiently orchestrate tasks over different platforms. As a result, it allows users to focus on the business logic of their applications rather than on the mechanics of how to compose and execute them. Using different real-world applications with RHEEM, we demonstrate how cross-platform data processing can accelerate performance by more than one order of magnitude compared to single-platform data processing.
引用
收藏
页码:1414 / 1427
页数:14
相关论文
共 50 条
  • [1] Cross-Platform Data Processing: Use Cases and Challenges
    Kaoudi, Zoi
    Quiane-Ruiz, Jorge-Arnulfo
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1723 - 1726
  • [2] GPRStudio: An Extensible Cross-Platform GPR Data Processing Tool
    Ozkan, Esra
    Ozkan, Ersin
    Nazli, Hakki
    Sezgin, Mehmet
    DETECTION AND SENSING OF MINES, EXPLOSIVE OBJECTS, AND OBSCURED TARGETS XXVI, 2021, 11750
  • [3] NMRFx Processor: a cross-platform NMR data processing program
    Michael Norris
    Bayard Fetler
    Jan Marchant
    Bruce A. Johnson
    Journal of Biomolecular NMR, 2016, 65 : 205 - 216
  • [4] NMRFx Processor: a cross-platform NMR data processing program
    Norris, Michael
    Fetler, Bayard
    Marchant, Jan
    Johnson, Bruce A.
    JOURNAL OF BIOMOLECULAR NMR, 2016, 65 (3-4) : 205 - 216
  • [5] Rheem: Enabling Multi-Platform Task Execution
    Agrawal, Divy
    Ba, Lamine
    Berti-Equille, Laure
    Chawla, Sanjay
    Elmagarmid, Ahmed
    Hammady, Hossam
    Idris, Yasser
    Kaoudi, Zoi
    Khayyat, Zuhair
    Kruse, Sebastian
    Ouzzani, Mourad
    Papotti, Paolo
    Quiane-Ruiz, Jorge-Arnulfo
    Tang, Nan
    Zaki, Mohammed J.
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 2069 - 2072
  • [6] A flexible cross-platform single-cell data processing pipeline
    Kai Battenberg
    S. Thomas Kelly
    Radu Abu Ras
    Nicola A. Hetherington
    Makoto Hayashi
    Aki Minoda
    Nature Communications, 13
  • [7] A flexible cross-platform single-cell data processing pipeline
    Battenberg, Kai
    Kelly, S. Thomas
    Ras, Radu Abu
    Hetherington, Nicola A.
    Hayashi, Makoto
    Minoda, Aki
    NATURE COMMUNICATIONS, 2022, 13 (01)
  • [8] Optimizing Cross-Platform Data Movement
    Kruse, Sebastian
    Kaoudi, Zoi
    Quiane-Ruiz, Jorge-Arnulfo
    Chawla, Sanjay
    Naumann, Felix
    Contreras-Rojas, Bertty
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1642 - 1645
  • [9] VComputeLib: Enabling Cross-Platform GPGPU on Mobile and Embedded GPUs
    Mammeri, Nadjib
    Juurlink, Ben
    17TH INTERNATIONAL CONFERENCE ON ADVANCES IN MOBILE COMPUTING & MULTIMEDIA (MOMM2019), 2019, : 242 - 251
  • [10] FEDKIT: Enabling Cross-Platform Federated Learning for Android and iOS
    He, Sichang
    Tang, Beilong
    Zhang, Boyan
    Shao, Jiaoqi
    Ouyang, Xiaomin
    Nugraha, Daniel Nata
    Luo, Bing
    IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS, INFOCOM WKSHPS 2024, 2024,