Fault-Tolerant Dynamic Task Graph Scheduling

被引:17
|
作者
Kurt, Mehmet Can [1 ]
Krishnamoorthy, Sriram [2 ]
Agrawal, Kunal [3 ]
Agrawal, Gagan [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Pacific Northwest Natl Lab, Richland, WA 99352 USA
[3] Washington Univ, St Louis, MO 63110 USA
来源
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS | 2014年
关键词
dag; task graphs; cilk; work stealing; fault tolerance; ALGORITHM;
D O I
10.1109/SC.2014.64
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we present an approach to fault-tolerant execution of dynamic task graphs scheduled using work stealing. In particular, we focus on selective and localized recovery of tasks in the presence of soft faults. From users, we elicit the basic task graph structure in terms of successor and predecessor relationships. The work-stealing-based algorithm to schedule such a task graph is augmented to enable recovery when the data and metadata associated with a task get corrupted. We use this redundancy, and knowledge of the task graph structure, to selectively recover from faults with low space and time overheads. We show that the fault tolerant design retains the essential properties of the underlying work stealing-based task scheduling algorithm, and that the fault tolerant execution is asymptotically optimal when task re-execution is taken into account. Experimental evaluation demonstrates the low cost of recovery under various fault scenarios.
引用
收藏
页码:719 / 730
页数:12
相关论文
共 50 条
  • [31] Load Balancing in Fault-Tolerant Real-Time Systems for Periodic Task Scheduling
    Jain, Divya
    Jain, Sushil Chandra
    2015 INTERNATIONAL CONFERENCED ON CIRCUITS, POWER AND COMPUTING TECHNOLOGIES (ICCPCT-2015), 2015,
  • [32] ADAPTIVE FAULT-TOLERANT TASK SCHEDULING FOR REAL-TIME ENERGY HARVESTING SYSTEMS
    Zhu, Linjie
    Wei, Tongquan
    Chen, Xiaodao
    Guo, Yonghe
    Hu, Shiyan
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2012, 21 (01)
  • [33] Improved Task Partition Based Fault-Tolerant Rate-Monotonic Scheduling Algorithm
    Guo, Pengze
    Xue, Zhi
    2016 INTERNATIONAL CONFERENCE ON SECURITY OF SMART CITIES, INDUSTRIAL CONTROL SYSTEM AND COMMUNICATIONS (SSIC), 2016,
  • [34] Fault-Tolerant Task Scheduling for Mixed-Criticality Real-Time Systems
    Zhou, Junlong
    Yin, Min
    Li, Zhifang
    Cao, Kun
    Yan, Jianming
    Wei, Tongquan
    Chen, Mingsong
    Fu, Xin
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2017, 26 (01)
  • [35] A fault-tolerant dynamic scheduling method on hierarchical mobile edge cloud computing
    Meng, Shunmei
    Li, Qianmu
    Wu, Taoran
    Huang, Weijia
    Zhang, Jing
    Li, Weimin
    COMPUTATIONAL INTELLIGENCE, 2019, 35 (03) : 577 - 598
  • [36] A new study for fault-tolerant real-time dynamic scheduling algorithms
    Manimaran, G
    Murthy, CSR
    3RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, PROCEEDINGS, 1996, : 289 - 294
  • [37] A new study for fault-tolerant real-time dynamic scheduling algorithms
    Manimaran, G
    Murthy, CSR
    JOURNAL OF SYSTEMS ARCHITECTURE, 1998, 45 (01) : 1 - 13
  • [38] GRAPH MODEL FOR FAULT-TOLERANT COMPUTING SYSTEMS
    HAYES, JP
    IEEE TRANSACTIONS ON COMPUTERS, 1976, 25 (09) : 875 - 884
  • [39] Fault-Tolerant Dynamic Scheduling and Routing for TSN based In-vehicle Networks
    Syed, Ammad Ali
    Ayaz, Serkan
    Leinmueller, Tim
    Chandra, Madhu
    2021 IEEE VEHICULAR NETWORKING CONFERENCE (VNC), 2021, : 72 - 75
  • [40] Fault-tolerant rate-monotonic scheduling
    Ghosh, S
    Melhem, R
    Mosse, D
    Sen Sarma, J
    REAL-TIME SYSTEMS, 1998, 15 (02) : 149 - 181