Cost-based Fault-tolerance for Parallel Data Processing

被引:13
|
作者
Salama, Abdallah [1 ]
Binnig, Carsten [1 ,2 ]
Kraska, Tim [2 ]
Zamanian, Erfan [2 ]
机构
[1] Baden Wuerttemberg Cooperat State Univ, Mannheim, Germany
[2] Brown Univ, Providence, RI 02912 USA
关键词
D O I
10.1145/2723372.2749437
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to deal with mid-query failures in parallel data engines (PDEs), different fault-tolerance schemes are implemented today: (1) fault-tolerance in parallel databases is typically implemented in a coarse-grained manner by restarting a query completely when a mid-query failure occurs, and (2) modern MapReduce-style PDEs implement a fine-grained fault-tolerance scheme, which either materializes intermediate results or implements a lineage model to recover from mid-query failures. However, neither of these schemes can efficiently handle mixed workloads with both short running interactive queries as well as long running batch queries nor do these schemes efficiently support a wide range of different cluster setups which vary in cluster size and other parameters such as the mean time between failures. In this paper, we present a novel cost-based fault-tolerance scheme which tackles this issue. Compared to the existing schemes, our scheme selects a subset of intermediates to be materialized such that the total query runtime is minimized under mid-query failures. Our experiments show that our cost-based fault-tolerance scheme outperforms all existing strategies and always selects the sweet spot for short- and long running queries as well as for different cluster setups.
引用
收藏
页码:285 / 297
页数:13
相关论文
共 50 条
  • [31] Fault-Tolerance in Resolvability
    Javaid, Imran
    Salman, Muhammad
    Chaudhry, Muhammad Anwar
    Shokat, Sara
    UTILITAS MATHEMATICA, 2009, 80 : 263 - 275
  • [32] A Novel Parallel Architecture with Fault-Tolerance for Joining Bi-Directional Data Streams in Cloud
    Liu, Xinchun
    Fan, Xiaopeng
    Li, Jing
    2013 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CLOUDCOM-ASIA), 2013, : 30 - 37
  • [33] COST-EFFECTIVE AND FLEXIBLE SCHEME FOR SOFTWARE FAULT-TOLERANCE
    BONDAVALLI, A
    DIGIANDOMENICO, F
    XU, J
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 1993, 8 (04): : 234 - 244
  • [34] COST-DRIVEN MACHINE: FAULT-TOLERANCE AND LEARNING.
    Eberbach, Eugeniusz
    A.M.S.E. review, 1987, 6 (02): : 37 - 47
  • [35] Low Cost Rollback to Improve Fault-Tolerance in VLSI Circuits
    Bonnoit, Thierry
    Zergainoh, Nacer-Eddine
    Nicolaidis, Michael
    Velazco, Raoul
    2017 IEEE 8TH LATIN AMERICAN SYMPOSIUM ON CIRCUITS & SYSTEMS (LASCAS), 2017,
  • [36] Fault-Tolerance of Hierarchical Power Management in Data Center
    Li, Jianxiang
    Lv, Yinan
    Kong, Xiangzhen
    INDUSTRIAL INSTRUMENTATION AND CONTROL SYSTEMS II, PTS 1-3, 2013, 336-338 : 2555 - 2558
  • [37] Byzantine Fault-Tolerance Consensus Algorithm Based on
    Li, Shuzhi
    Xiong, Weizhi
    Deng, Xiaohong
    Wang, Zhiqiang
    Liu, Hunwen
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (07) : 2484 - 2493
  • [38] An Efficient Intermediate Data Fault-Tolerance Approach in the Cloud
    Song, Baoyan
    Ren, Cai
    Li, Xuecheng
    Ding, Linlin
    2014 11TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2014, : 203 - 206
  • [39] Low-cost fault-tolerance for mobile nodes in mobile IP based systems
    Ahn, J
    Hwang, C
    21ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS, PROCEEDINGS, 2001, : 508 - 513
  • [40] A comparative cost analysis of fault-tolerance mechanisms for availability on the cloud
    Sampaio, Altino M.
    Barbosa, Jorge G.
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2018, 19 : 315 - 323