Deep reinforcement learning for fault-tolerant workflow scheduling in cloud environment

被引:18
|
作者
Dong, Tingting [1 ,2 ]
Xue, Fei [1 ]
Tang, Hengliang [1 ]
Xiao, Chuangbai [2 ]
机构
[1] Beijing Wuzi Univ, Beijing, Peoples R China
[2] Beijing Univ Technol, Beijing, Peoples R China
关键词
Fault-tolerant strategy; Workflow scheduling; Resubmission; Replication; Deep reinforcement learning; ENERGY; COST;
D O I
10.1007/s10489-022-03963-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cloud computing is widely used in various fields, which can provide sufficient computing resources to address users' demands (workflows) quickly and effectively. However, resource failure is inevitable, and a challenge to optimize the workflow scheduling is to consider the fault tolerance. Most of previous algorithms are based on failure prediction and fault-tolerant strategies, which can cause the time delay and waste of resources. In this paper, combining the above two methods through a deep reinforcement learning framework, an adaptive fault-tolerant workflow scheduling framework called RLFTWS is proposed, aiming to minimize the makespan and resource usage rate. In this framework, the fault-tolerant workflow scheduling is formulated as a markov decision process. Resubmission and replication strategy are as two actions. A heuristic algorithm is designed for the task allocation and execution according to the selected fault-tolerant strategy. And, double deep Q network framework (DDQN) is developed to select the fault-tolerant strategy adaptively for each task under the current environment state, which is not only prediction but also learning in the process of interacting with the environment. Simulation results show that the proposed RLFTWS can efficiently balance the makespan and resource usage rate, and achieve fault tolerance.
引用
收藏
页码:9916 / 9932
页数:17
相关论文
共 50 条
  • [1] Deep reinforcement learning for fault-tolerant workflow scheduling in cloud environment
    Tingting Dong
    Fei Xue
    Hengliang Tang
    Chuangbai Xiao
    Applied Intelligence, 2023, 53 : 9916 - 9932
  • [2] Workflow scheduling based on deep reinforcement learning in the cloud environment
    Tingting Dong
    Fei Xue
    Chuangbai Xiao
    Jiangjiang Zhang
    Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 10823 - 10835
  • [3] Workflow scheduling based on deep reinforcement learning in the cloud environment
    Dong, Tingting
    Xue, Fei
    Xiao, Chuangbai
    Zhang, Jiangjiang
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (12) : 10823 - 10835
  • [4] Deep Reinforcement Learning for Dynamic Workflow Scheduling in Cloud Environment
    Dong, Tingting
    Xue, Fei
    Xiao, Changbai
    Zhang, Jiangjiang
    2021 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2021), 2021, : 107 - 115
  • [5] Fault-tolerant elastic scheduling algorithm for workflow in Cloud systems
    Ding, Yongsheng
    Yao, Guangshun
    Hao, Kuangrong
    INFORMATION SCIENCES, 2017, 393 : 47 - 65
  • [6] Fault-tolerant scheduling algorithm for service workflow in MEC environment
    Yuan Y.
    Huang X.
    Yu D.
    Li Z.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2021, 27 (06): : 1683 - 1702
  • [7] Using Imbalance Characteristic for Fault-Tolerant Workflow Scheduling in Cloud Systems
    Yao, Guangshun
    Ding, Yongsheng
    Hao, Kuangrong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (12) : 3671 - 3683
  • [8] A Novel Fault-Tolerant Aware Task Scheduler Using Deep Reinforcement Learning in Cloud Computing
    Krishna, Mallu Shiva Rama
    Mangalampalli, Sudheer
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [9] A Reinforcement Learning Based Workflow Application Scheduling Approach in Dynamic Cloud Environment
    Wei, Yi
    Kudenko, Daniel
    Liu, Shijun
    Pan, Li
    Wu, Lei
    Meng, Xiangxu
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2017, 2018, 252 : 120 - 131
  • [10] Fault-Tolerant Workflow Scheduling Using Spot Instances on Clouds
    Poola, Deepak
    Ramamohanarao, Kotagiri
    Buyya, Rajkumar
    2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2014, 29 : 523 - 533