Oikonomos-II: A Reinforcement-Learning, Resource-Recommendation System for Cloud HPC

被引：0

作者：

Betting, J. L. F. ^{[1
]}

De Zeeuw, C. I. ^{[1
,2
]}

Strydis, C. ^{[1
,3
]}

机构：

[1] Erasmus MC, Dept Neurosci, Rotterdam, Netherlands

[2] Netherlands Inst Neurosci, Amsterdam, Netherlands

[3] Delft Univ Technol, Quantum & Comp Engn Dept, Delft, Netherlands

来源：

2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023 | 2023年

基金：

荷兰研究理事会;

关键词：

High-Performance Computing; resource recommendation; cloud computing; prediction; middleware;

D O I：

10.1109/HiPC58850.2023.00044

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The cloud has become a powerful and useful environment for the deployment of High-Performance Computing (HPC) applications, but the large number of available instance types poses a challenge in selecting the optimal platform. Users often do not have the time or knowledge necessary to make an optimal choice. Recommender systems have been developed for this purpose but current state-of-the-art systems either require large amounts of training data, or require running the application multiple times; this is costly. In this work, we propose Oikonomos-II, a resource-recommendation system based on reinforcement learning for HPC applications in the cloud. Oikonomos-II models the relationship between different input parameters, instance types, and execution times. The system does not require any preexisting training data or repeated job executions, as it gathers its own training data opportunistically using user-submitted jobs, employing a variant of the Neural-LinUCB algorithm. When deployed on a mix of HPC applications, Oikonomos-II quickly converged towards an optimal policy. The system eliminates the need for preexisting training data or auxiliary runs, providing an economical, general-purpose, resource-recommendation system for cloud HPC.

引用

页码：266 / 276

页数：11

共 50 条

[21] Maze-solving in a plasma system based on functional analogies to reinforcement-learning model
Sakai, Osamu
Karasaki, Toshifusa
Ito, Tsuyohito
Murakami, Tomoyuki
Tanaka, Manabu
Kambara, Makoto
Hirayama, Satoshi
[J]. PLOS ONE, 2024, 19 (04):
[22] A Novel Adaptive Resource Allocation Model Based on SMDP and Reinforcement Learning Algorithm in Vehicular Cloud System
Liang, Hongbin
Zhang, Xiaohui
Zhang, Jin
Li, Qizhen
Zhou, Shuya
Zhao, Lian
[J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (10) : 10018 - 10029
[23] Reinforcement Learning Approach for Optimizing Cloud Resource Utilization With Load Balancing
Lahande, Prathamesh Vijay
Kaveri, Parag Ravikant
Saini, Jatinderkumar R.
Kotecha, Ketan
Alfarhood, Sultan
[J]. IEEE ACCESS, 2023, 11 : 127567 - 127577
[24] A Reinforcement Learning-Based Resource Allocation Scheme for Cloud Robotics
Liu, Hang
Liu, Shiwen
Zheng, Kan
[J]. IEEE ACCESS, 2018, 6 : 17215 - 17222
[25] Multi-resource interleaving for task scheduling in cloud-edge system by deep reinforcement learning
Pei, Xinglong
Sun, Penghao
Hu, Yuxiang
Li, Dan
Tian, Le
Li, Ziyong
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 160 : 522 - 536
[26] Efficient Adaptive Resource Provisioning for Cloud Applications using Reinforcement Learning
John, Indu
Bhatnagar, Shalabh
Sreekantan, Aiswarya
[J]. 2019 IEEE 4TH INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W 2019), 2019, : 271 - 272
[27] Reinforcement Learning to Improve Resource Scheduling and Load Balancing in Cloud Computing
Kaveri P.R.
Lahande P.
[J]. SN Computer Science, 4 (2)
[28] Resource Scheduling for Offline Cloud Computing Using Deep Reinforcement Learning
El-Boghdadi, Hatem M.
Ramadan, Rabie A.
[J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2019, 19 (04): : 54 - 60
[29] Resource Management in Multi-Cloud Scenarios via Reinforcement Learning
Pietrabissa, Antonio
Battilotti, Stefano
Facchinei, Francisco
Giuseppi, Alessandro
Oddi, Guido
Panfili, Martina
Suraci, Vincenzo
[J]. 2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 9084 - 9089
[30] COUNSEL: Cloud Resource Configuration Management using Deep Reinforcement Learning
Hegde, Adithya
Kulkarni, Sameer G.
Prasad, Abhinandan S.
[J]. 2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID, 2023, : 286 - 298

← 1 2 3 4 5 →