Batch Jobs Load Balancing Scheduling in Cloud Computing Using Distributional Reinforcement Learning

被引:0
|
作者
Li, Tiangang [1 ]
Ying, Shi [1 ]
Zhao, Yishi [2 ]
Shang, Jianga [3 ,4 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] China Univ Geosci, Sch Comp Sci, Wuhan 430072, Peoples R China
[3] China Univ Geosci, Sch Comp Sci, Wuhan 430072, Peoples R China
[4] China Univ Geosci, Natl Engn Res Ctr Geog Informat Syst, Wuhan 430072, Peoples R China
关键词
Batch jobs scheduling; cloud computing; distributional reinforcement learning; load balancing; service level agreement; ALGORITHM; ALLOCATION; SCHEME;
D O I
10.1109/TPDS.2023.3334519
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In cloud computing, how to reasonably allocate computing resources for batch jobs to ensure the load balance of dynamic clusters and meet user requests is an important and challenging task. Most existing studies are based on deep Q network, which utilizes neural networks to estimate the expected value of cumulative return in the scheduling process. The value-based DQN algorithms ignore the complete information contained in the value distribution and lack strong adaptability to time-varying batch jobs and dynamic cluster resources. Therefore, to capture the inherent stochasticity of the scheduling process caused by environmental stochasticity, we utilize Distributional Reinforcement Learning to model the value distribution of the cumulative return. Specifically, we formalize the load balancing scheduling as a multi-objective optimization problem and construct a Distributional Reinforcement Learning model. Then we introduce quantile regression to learn the value distribution of the cumulative return during scheduling and propose a dynamic load balancing scheduling algorithm based on Distributional Reinforcement Learning. In addition, we develop a cluster environment for real-time processing of batch jobs to simulate the arrival of batch jobs and train the Distributional Reinforcement Learning-based scheduling agent. We conduct empirical experiments and detailed analysis by using the real Alibaba Cluster cluster traces v2018 and v2020. The results show that compared to the baseline algorithms, the proposed algorithm performs better in terms of cluster load balancing, success rate of instance creation and average completion time of the tasks. The experimental results on different trace datasets also indicate that the propsoed algorithm exhibits excellent scalability.
引用
下载
收藏
页码:169 / 185
页数:17
相关论文
共 50 条
  • [1] Reinforcement Learning to Improve Resource Scheduling and Load Balancing in Cloud Computing
    Kaveri P.R.
    Lahande P.
    SN Computer Science, 4 (2)
  • [2] A scheduling strategy on load balancing in cloud computing
    College of Computer Science, Chongqing University, Chongqing
    400044, China
    不详
    401122, China
    Xitong Gongcheng Lilum yu Shijian, (269-275):
  • [3] Model of Load Balancing and Scheduling in Cloud Computing
    Vilutis, Gytis
    Daugirdas, Linas
    Kavaliunas, Rimantas
    Sutiene, Kristina
    Vaidelys, Martynas
    PROCEEDINGS OF THE ITI 2012 34TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES (ITI), 2012, : 117 - 122
  • [4] COST-EFFECTIVE SCHEDULING AND LOAD BALANCING ALGORITHMS IN CLOUD COMPUTING USING LEARNING AUTOMATA
    Sarhadi, Ali
    Akbari, Javad Torkestani
    COMPUTING AND INFORMATICS, 2023, 42 (01) : 37 - 74
  • [5] Application Scheduling in Mobile Cloud Computing with Load Balancing
    Wei, Xianglin
    Fan, Jianhua
    Lu, Ziyi
    Ding, Ke
    JOURNAL OF APPLIED MATHEMATICS, 2013,
  • [6] Battle Royale deep reinforcement learning algorithm for effective load balancing in cloud computing
    Haris, Mohammad
    Zubair, Swaleha
    Cluster Computing, 2025, 28 (01)
  • [7] Resource Scheduling and Load Balancing Fusion Algorithm with Deep Learning Based on Cloud Computing
    Hou, Xiaojing
    Zhao, Guozeng
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING, 2018, 13 (03) : 54 - 72
  • [8] Resource Scheduling for Offline Cloud Computing Using Deep Reinforcement Learning
    El-Boghdadi, Hatem M.
    Ramadan, Rabie A.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2019, 19 (04): : 54 - 60
  • [9] A Load Balancing Algorithm for Virtual Machines Scheduling in Cloud Computing
    Liu, Li
    Qiu, Zhe
    Dong, Jie
    2017 9TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION AND CONTROL (ICMIC 2017), 2017, : 471 - 475
  • [10] A Task Scheduling Algorithm Based on Load Balancing in Cloud Computing
    Fang, Yiqiu
    Wang, Fei
    Ge, Junwei
    WEB INFORMATION SYSTEMS AND MINING, 2010, 6318 : 271 - +