Cost-based Fault-tolerance for Parallel Data Processing

被引:13
|
作者
Salama, Abdallah [1 ]
Binnig, Carsten [1 ,2 ]
Kraska, Tim [2 ]
Zamanian, Erfan [2 ]
机构
[1] Baden Wuerttemberg Cooperat State Univ, Mannheim, Germany
[2] Brown Univ, Providence, RI 02912 USA
关键词
D O I
10.1145/2723372.2749437
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to deal with mid-query failures in parallel data engines (PDEs), different fault-tolerance schemes are implemented today: (1) fault-tolerance in parallel databases is typically implemented in a coarse-grained manner by restarting a query completely when a mid-query failure occurs, and (2) modern MapReduce-style PDEs implement a fine-grained fault-tolerance scheme, which either materializes intermediate results or implements a lineage model to recover from mid-query failures. However, neither of these schemes can efficiently handle mixed workloads with both short running interactive queries as well as long running batch queries nor do these schemes efficiently support a wide range of different cluster setups which vary in cluster size and other parameters such as the mean time between failures. In this paper, we present a novel cost-based fault-tolerance scheme which tackles this issue. Compared to the existing schemes, our scheme selects a subset of intermediates to be materialized such that the total query runtime is minimized under mid-query failures. Our experiments show that our cost-based fault-tolerance scheme outperforms all existing strategies and always selects the sweet spot for short- and long running queries as well as for different cluster setups.
引用
收藏
页码:285 / 297
页数:13
相关论文
共 50 条
  • [1] Cost of Fault-Tolerance on Data Stream Processing
    Vianello, Valerio
    Patino-Martinez, Marta
    Azqueta-Alzuar, Ainhoa
    Jimenez-Peris, Ricardo
    EURO-PAR 2018: PARALLEL PROCESSING WORKSHOPS, 2019, 11339 : 17 - 27
  • [2] FAULT-TOLERANCE IN PARALLEL ARCHITECTURES
    SAMI, MG
    SCARABOTTOLO, N
    LECTURE NOTES IN COMPUTER SCIENCE, 1987, 272 : 349 - 372
  • [3] COMPARATIVE FAULT-TOLERANCE OF PARALLEL DISTRIBUTED-PROCESSING NETWORKS
    SEGEE, BE
    CARTER, MJ
    IEEE TRANSACTIONS ON COMPUTERS, 1994, 43 (11) : 1323 - 1329
  • [4] Fault-tolerance of functional programs based on the parallel graph reduction
    Kitakami, M
    Kubota, S
    Ito, H
    2001 PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2001, : 319 - 322
  • [5] Fault-tolerance in distributed query processing
    Smith, J
    Watson, P
    9TH INTERNATIONAL DATABASE ENGINEERING & APPLICATION SYMPOSIUM, PROCEEDINGS, 2005, : 329 - 338
  • [6] FAULT-TOLERANCE IN HIGHLY PARALLEL HARDWARE SYSTEMS
    GROSSPIETSCH, KE
    IEEE MICRO, 1994, 14 (01) : 60 - 68
  • [7] A Classification-Based Approach to Fault-Tolerance Support in Parallel Programs
    Jakadeesan, Gopinatha
    Goswami, Dhrubajyoti
    2009 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2009), 2009, : 255 - 262
  • [8] Fault-tolerance of parallel volume rendering on cluster of PCs
    Guedes, S
    Bentes, C
    da Silva, GP
    Farias, R
    PDPTA '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-3, 2004, : 61 - 66
  • [9] FAULT-TOLERANCE
    GROSSPIETSCH, KE
    MICROPROCESSING AND MICROPROGRAMMING, 1993, 38 (1-5): : 783 - 783
  • [10] Strong fault-tolerance: Parallel routing in networks with faults
    Chen, JE
    Oh, E
    COMPUTATIONAL SCIENCE -- ICCS 2001, PROCEEDINGS PT 2, 2001, 2074 : 609 - 618