Fault-Tolerant Scheduling for Real-Time Scientific Workflows with Elastic Resource Provisioning in Virtualized Clouds

被引:98
|
作者
Zhu, Xiaomin [1 ]
Wang, Ji [1 ]
Guo, Hui [2 ]
Zhu, Dakai [3 ]
Yang, Laurence T. [4 ]
Liu, Ling [5 ]
机构
[1] Natl Univ Def Technol, Sci & Technol Informat Syst Engn Lab, Changsha 410073, Hunan, Peoples R China
[2] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
[3] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA
[4] St Francis Xavier Univ, Dept Comp Sci, Antigonish, NS B2G 2W5, Canada
[5] Georgia Inst Technol, Coll Comp, 266 Ferst Dr, Atlanta, GA 30332 USA
基金
中国国家自然科学基金;
关键词
Virtualized clouds; fault-tolerant scheduling; primary-backup model; overlapping; VM migration; TASKS; ALGORITHM;
D O I
10.1109/TPDS.2016.2543731
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Clouds are becoming an important platform for scientific workflow applications. However, with many nodes being deployed in clouds, managing reliability of resources becomes a critical issue, especially for the real-time scientific workflow execution where deadlines should be satisfied. Therefore, fault tolerance in clouds is extremely essential. The PB (primary backup) based scheduling is a popular technique for fault tolerance and has effectively been used in the cluster and grid computing. However, applying this technique for real-time workflows in a virtualized cloud is much more complicated and has rarely been studied. In this paper, we address this problem. We first establish a real-time workflow fault-tolerant model that extends the traditional PB model by incorporating the cloud characteristics. Based on this model, we develop approaches for task allocation and message transmission to ensure faults can be tolerated during the workflow execution. Finally, we propose a dynamic fault-tolerant scheduling algorithm, FASTER, for real-time workflows in the virtualized cloud. FASTER has three key features: 1) it employs a backward shifting method to make full use of the idle resources and incorporates task overlapping and VM migration for high resource utilization, 2) it applies the vertical/horizontal scaling-up technique to quickly provision resources for a burst of workflows, and 3) it uses the vertical scaling-down scheme to avoid unnecessary and ineffective resource changes due to fluctuated workflow requests. We evaluate our FASTER algorithm with synthetic workflows and workflows collected from the real scientific and business applications and compare it with six baseline algorithms. The experimental results demonstrate that FASTER can effectively improve the resource utilization and schedulability even in the presence of node failures in virtualized clouds.
引用
收藏
页码:3501 / 3517
页数:17
相关论文
共 50 条
  • [1] Real-time and dynamic fault-tolerant scheduling for scientific workflows in clouds
    Li, Zhongjin
    Chang, Victor
    Hu, Haiyang
    Hu, Hua
    Li, Chuanyi
    Ge, Jidong
    [J]. INFORMATION SCIENCES, 2021, 568 : 13 - 39
  • [2] FESTAL: Fault-Tolerant Elastic Scheduling Algorithm for Real-Time Tasks in Virtualized Clouds
    Wang, Ji
    Bao, Weidong
    Zhu, Xiaomin
    Yang, Laurence T.
    Xiang, Yang
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (09) : 2545 - 2558
  • [3] Real-Time Fault-Tolerant Scheduling Based on Primary-Backup Approach in Virtualized Clouds
    Wang, Ji
    Zhu, Xiaomin
    Bao, Weidong
    [J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1127 - 1134
  • [4] Fault-Tolerant Real-Time Scheduling
    B. Kalyanasundaram
    K. Pruhs
    [J]. Algorithmica, 2000, 28 : 125 - 144
  • [5] Fault-tolerant real-time scheduling
    Kalyanasundaram, B
    Pruhs, K
    [J]. ALGORITHMICA, 2000, 28 (01) : 125 - 144
  • [6] Fault-Tolerant Scheduling Algorithm for Periodic Real-Time Tasks in Clouds
    Guo, Pengze
    Liu, Ming
    Xue, Zhi
    [J]. PROCEEDINGS OF 2018 IEEE 4TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2018), 2018, : 467 - 470
  • [7] A fault-tolerant real-time scheduling algorithm in software fault-tolerant module
    Liu, Dong
    Xing, Weiyan
    Li, Rui
    Zhang, Chunyuan
    Li, Haiyan
    [J]. COMPUTATIONAL SCIENCE - ICCS 2007, PT 4, PROCEEDINGS, 2007, 4490 : 961 - +
  • [8] Fault-Tolerant Scheduling for Scientific Workflows in Cloud Environments
    Vinay, K.
    Kumar, S. M. Dilip
    [J]. 2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 150 - 155
  • [9] Deadline Based Resource Provisioning and Scheduling Algorithm for Scientific Workflows on Clouds
    Rodriguez, Maria Alejandra
    Buyya, Rajkumar
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2014, 2 (02) : 222 - 235
  • [10] Real-time scheduling in a generic fault-tolerant architecture
    Wellings, AJ
    Beus-Dukic, L
    Powell, D
    [J]. 19TH IEEE REAL-TIME SYSTEMS SYMPOSIUM, PROCEEDINGS, 1998, : 390 - 398