Fault-Tolerant Parallel Execution of Workflows with Deadlines

被引:1
|
作者
Eitschberger, Patrick [1 ]
Keller, Joerg [1 ]
机构
[1] Fernuniv, Fac Math & Comp Sci, D-58084 Hagen, Germany
来源
2017 25TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2017) | 2017年
关键词
D O I
10.1109/PDP.2017.30
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Workflows of dependent tasks are a widespread model for parallel applications, often statically scheduled prior to application. Static schedules can tolerate processor failures due to permanent faults by placing duplicate tasks during the scheduling process. Schedules for workflows with deadlines can be extended to include frequency scaling information to optimize energy consumption. Frequency scaling can also be used in case of a fault to minimize its effects on the schedule makespan, however for the price of additional energy consumption. We investigate the interplay between these two parameters and quantify the energy increase to be expected in case of a fault and a given makespan increase. This knowledge enables the user to inform the scheduler about the makespan increase that is tolerable in case of a fault, where tolerable includes both the related performance aspects and the expected increase in energy. To achieve this, we model small taskgraphs from a benchmark suite as integer linear programs and determine with the help of a solver energy-optimal schedules for the fault-free case and for all possible fault positions with several levels of makespan increase. We present averages and distribution depending on makespan increase for a processor with hypothetical power profile. Additionally, we present two heuristics to modify task frequency settings in case of a fault, to restrict the makespan increase to a given value. Comparison with optimal frequency settings from the benchmark suite indicate that the heuristics only incur a small energy overhead.
引用
收藏
页码:78 / 84
页数:7
相关论文
共 50 条
  • [41] A fault-tolerant computing method for Xdraw parallel algorithm
    Wanfeng Dou
    Yanan Li
    The Journal of Supercomputing, 2018, 74 : 2776 - 2800
  • [42] An analytical model for a parallel fault-tolerant computing system
    Personè, VD
    Grassi, V
    PERFORMANCE EVALUATION, 1999, 38 (3-4) : 201 - 218
  • [43] A new fault-tolerant interconnection topology for parallel systems
    Tripathy, C.R.
    Dash, R.K.
    Journal of the Institution of Engineers (India), Part CP: Computer Engineering Division, 2008, 89 (MAY): : 8 - 13
  • [44] Reliability of fault-tolerant systems with parallel task processing
    Levitin, Gregory
    Xie, Min
    Zhang, Tieling
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2007, 177 (01) : 420 - 430
  • [45] Efficient Fault-Tolerant Design for Parallel Matched Filters
    Gao, Zhen
    Zhou, Ming
    Reviriego, Pedro
    Antonio Maestro, Juan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2018, 65 (03) : 366 - 370
  • [46] A parallel and fault-tolerant LAN with dual communication subnetworks
    Wang, HQ
    Yin, ZL
    Wang, D
    SECOND AIZU INTERNATIONAL SYMPOSIUM ON PARALLEL ALGORITHMS/ARCHITECTURE SYNTHESIS, PROCEEDINGS, 1997, : 340 - 346
  • [47] Analysis for performance and reliability of fault-tolerant parallel software
    Sugino, Eiji
    Yokota, Haruo
    Systems and Computers in Japan, 2000, 31 (07) : 56 - 65
  • [48] Fault-tolerant parallel applications using queues and actions
    Smith, J
    Shrivastava, S
    PROCEEDINGS OF THE 1997 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 1997, : 145 - 149
  • [49] A Markov model for fault-tolerant task parallel computations
    Bertolli, Carlo
    Meneghin, Massimiliano
    Gabarro, Joaquim
    FROM GRIDS TO SERVICE AND PERVASIVE COMPUTING, 2008, : 123 - +
  • [50] Fault-tolerant architecture for serial-parallel multipliers
    Abd El-Gawad, AO
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 1998, 11 (1-2) : 118 - 126