Oikonomos-II: A Reinforcement-Learning, Resource-Recommendation System for Cloud HPC

被引:0
|
作者
Betting, J. L. F. [1 ]
De Zeeuw, C. I. [1 ,2 ]
Strydis, C. [1 ,3 ]
机构
[1] Erasmus MC, Dept Neurosci, Rotterdam, Netherlands
[2] Netherlands Inst Neurosci, Amsterdam, Netherlands
[3] Delft Univ Technol, Quantum & Comp Engn Dept, Delft, Netherlands
基金
荷兰研究理事会;
关键词
High-Performance Computing; resource recommendation; cloud computing; prediction; middleware;
D O I
10.1109/HiPC58850.2023.00044
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The cloud has become a powerful and useful environment for the deployment of High-Performance Computing (HPC) applications, but the large number of available instance types poses a challenge in selecting the optimal platform. Users often do not have the time or knowledge necessary to make an optimal choice. Recommender systems have been developed for this purpose but current state-of-the-art systems either require large amounts of training data, or require running the application multiple times; this is costly. In this work, we propose Oikonomos-II, a resource-recommendation system based on reinforcement learning for HPC applications in the cloud. Oikonomos-II models the relationship between different input parameters, instance types, and execution times. The system does not require any preexisting training data or repeated job executions, as it gathers its own training data opportunistically using user-submitted jobs, employing a variant of the Neural-LinUCB algorithm. When deployed on a mix of HPC applications, Oikonomos-II quickly converged towards an optimal policy. The system eliminates the need for preexisting training data or auxiliary runs, providing an economical, general-purpose, resource-recommendation system for cloud HPC.
引用
收藏
页码:266 / 276
页数:11
相关论文
共 50 条
  • [21] Maze-solving in a plasma system based on functional analogies to reinforcement-learning model
    Sakai, Osamu
    Karasaki, Toshifusa
    Ito, Tsuyohito
    Murakami, Tomoyuki
    Tanaka, Manabu
    Kambara, Makoto
    Hirayama, Satoshi
    [J]. PLOS ONE, 2024, 19 (04):
  • [22] A Novel Adaptive Resource Allocation Model Based on SMDP and Reinforcement Learning Algorithm in Vehicular Cloud System
    Liang, Hongbin
    Zhang, Xiaohui
    Zhang, Jin
    Li, Qizhen
    Zhou, Shuya
    Zhao, Lian
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (10) : 10018 - 10029
  • [23] Reinforcement Learning Approach for Optimizing Cloud Resource Utilization With Load Balancing
    Lahande, Prathamesh Vijay
    Kaveri, Parag Ravikant
    Saini, Jatinderkumar R.
    Kotecha, Ketan
    Alfarhood, Sultan
    [J]. IEEE ACCESS, 2023, 11 : 127567 - 127577
  • [24] A Reinforcement Learning-Based Resource Allocation Scheme for Cloud Robotics
    Liu, Hang
    Liu, Shiwen
    Zheng, Kan
    [J]. IEEE ACCESS, 2018, 6 : 17215 - 17222
  • [25] Multi-resource interleaving for task scheduling in cloud-edge system by deep reinforcement learning
    Pei, Xinglong
    Sun, Penghao
    Hu, Yuxiang
    Li, Dan
    Tian, Le
    Li, Ziyong
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 160 : 522 - 536
  • [26] Efficient Adaptive Resource Provisioning for Cloud Applications using Reinforcement Learning
    John, Indu
    Bhatnagar, Shalabh
    Sreekantan, Aiswarya
    [J]. 2019 IEEE 4TH INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W 2019), 2019, : 271 - 272
  • [27] Reinforcement Learning to Improve Resource Scheduling and Load Balancing in Cloud Computing
    Kaveri P.R.
    Lahande P.
    [J]. SN Computer Science, 4 (2)
  • [28] Resource Scheduling for Offline Cloud Computing Using Deep Reinforcement Learning
    El-Boghdadi, Hatem M.
    Ramadan, Rabie A.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2019, 19 (04): : 54 - 60
  • [29] Resource Management in Multi-Cloud Scenarios via Reinforcement Learning
    Pietrabissa, Antonio
    Battilotti, Stefano
    Facchinei, Francisco
    Giuseppi, Alessandro
    Oddi, Guido
    Panfili, Martina
    Suraci, Vincenzo
    [J]. 2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 9084 - 9089
  • [30] COUNSEL: Cloud Resource Configuration Management using Deep Reinforcement Learning
    Hegde, Adithya
    Kulkarni, Sameer G.
    Prasad, Abhinandan S.
    [J]. 2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID, 2023, : 286 - 298