Fault-Tolerant Dynamic Task Graph Scheduling

被引:17
|
作者
Kurt, Mehmet Can [1 ]
Krishnamoorthy, Sriram [2 ]
Agrawal, Kunal [3 ]
Agrawal, Gagan [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Pacific Northwest Natl Lab, Richland, WA 99352 USA
[3] Washington Univ, St Louis, MO 63110 USA
关键词
dag; task graphs; cilk; work stealing; fault tolerance; ALGORITHM;
D O I
10.1109/SC.2014.64
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we present an approach to fault-tolerant execution of dynamic task graphs scheduled using work stealing. In particular, we focus on selective and localized recovery of tasks in the presence of soft faults. From users, we elicit the basic task graph structure in terms of successor and predecessor relationships. The work-stealing-based algorithm to schedule such a task graph is augmented to enable recovery when the data and metadata associated with a task get corrupted. We use this redundancy, and knowledge of the task graph structure, to selectively recover from faults with low space and time overheads. We show that the fault tolerant design retains the essential properties of the underlying work stealing-based task scheduling algorithm, and that the fault tolerant execution is asymptotically optimal when task re-execution is taken into account. Experimental evaluation demonstrates the low cost of recovery under various fault scenarios.
引用
收藏
页码:719 / 730
页数:12
相关论文
共 50 条
  • [1] Fault-tolerant task scheduling based on task duplication
    Min, BJ
    Kim, CK
    Jeon, SH
    PDPTA'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, 2001, : 2134 - 2139
  • [2] Fault-tolerant dynamic job scheduling policy
    Abawajy, JH
    DISTRIBUTED AND PARALLEL COMPUTING, 2005, 3719 : 165 - 173
  • [3] Dynamic replication of fault-tolerant scheduling algorithm
    Wang, Hongxia
    Fang, Haoran
    Qiu, Xin
    Open Cybernetics and Systemics Journal, 2015, 9 : 2670 - 2676
  • [4] Dynamic replication of fault-tolerant scheduling algorithm
    School of Computer Science and Engineering, Shenyang Ligong University, Shenyang
    110159, China
    Open. Cybern. Syst. J., 1 (2670-2676):
  • [5] Fault-tolerant scheduling
    Kalyanasundaram, B
    Pruhs, KR
    SIAM JOURNAL ON COMPUTING, 2005, 34 (03) : 697 - 719
  • [6] A Novel Fault-tolerant Task Scheduling Algorithm for Computational Grids
    Naik, Jairam K.
    Satyanarayana, N.
    2013 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING TECHNOLOGIES (ICACT), 2013,
  • [7] FAULT-TOLERANT LPT TASK-SCHEDULING IN MULTIPROCESSOR SYSTEMS
    BERTOSSI, AA
    MANCINI, L
    MICROPROCESSORS AND MICROSYSTEMS, 1992, 16 (02) : 91 - 99
  • [8] Fault-Tolerant Scheduling Mechanism for Dynamic Edge Computing Scenarios Based on Graph Reinforcement Learning
    Zhang, Yuze
    Xia, Geming
    Yu, Chaodong
    Li, Hongcheng
    Li, Hongfeng
    SENSORS, 2024, 24 (21)
  • [9] Modeling and Analyzing Dynamic Fault-Tolerant Strategy for Deadline Constrained Task Scheduling in Cloud Computing
    Fan, Guisheng
    Chen, Liqiong
    Yu, Huiqun
    Liu, Dongmei
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (04): : 1260 - 1274
  • [10] Fault-Tolerant Dynamic Task Mapping and Scheduling for Network-on-Chip-Based Multicore Platform
    Chatterjee, Navonil
    Paul, Suraj
    Chattopadhyay, Santanu
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2017, 16 (04)