Oikonomos-II: A Reinforcement-Learning, Resource-Recommendation System for Cloud HPC

被引：0

作者：

Betting, J. L. F. ^{[1
]}

De Zeeuw, C. I. ^{[1
,2
]}

Strydis, C. ^{[1
,3
]}

机构：

[1] Erasmus MC, Dept Neurosci, Rotterdam, Netherlands

[2] Netherlands Inst Neurosci, Amsterdam, Netherlands

[3] Delft Univ Technol, Quantum & Comp Engn Dept, Delft, Netherlands

来源：

2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023 | 2023年

基金：

荷兰研究理事会;

关键词：

High-Performance Computing; resource recommendation; cloud computing; prediction; middleware;

D O I：

10.1109/HiPC58850.2023.00044

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The cloud has become a powerful and useful environment for the deployment of High-Performance Computing (HPC) applications, but the large number of available instance types poses a challenge in selecting the optimal platform. Users often do not have the time or knowledge necessary to make an optimal choice. Recommender systems have been developed for this purpose but current state-of-the-art systems either require large amounts of training data, or require running the application multiple times; this is costly. In this work, we propose Oikonomos-II, a resource-recommendation system based on reinforcement learning for HPC applications in the cloud. Oikonomos-II models the relationship between different input parameters, instance types, and execution times. The system does not require any preexisting training data or repeated job executions, as it gathers its own training data opportunistically using user-submitted jobs, employing a variant of the Neural-LinUCB algorithm. When deployed on a mix of HPC applications, Oikonomos-II quickly converged towards an optimal policy. The system eliminates the need for preexisting training data or auxiliary runs, providing an economical, general-purpose, resource-recommendation system for cloud HPC.

引用

页码：266 / 276

页数：11

共 50 条

[1] Oikonomos: An Opportunistic, Deep-Learning, Resource-Recommendation System for Cloud HPC
Betting, Jan-Harm
Liakopoulos, Dimitrios
Engelen, Max
Strydis, Christos
[J]. 2023 IEEE 34TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, ASAP, 2023, : 188 - 196
[2] Ensuring Novelty and Transparency in Learning Resource-Recommendation Based on Deep Learning Techniques
Alkhatib, Wael
Araache, Eid
Rensing, Christoph
Schnitzer, Steffen
[J]. LIFELONG TECHNOLOGY-ENHANCED LEARNING, EC-TEL 2018, 2018, 11082 : 609 - 612
[3] DJ-MC: A Reinforcement-Learning Agent for Music Playlist Recommendation
Liebman, Elad
Saar-Tsechansky, Maytal
Stone, Peter
[J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15), 2015, : 591 - 599
[4] Nonintrusive-Sensing and Reinforcement-Learning Based Adaptive Personalized Music Recommendation
Hong, Daocheng
Li, Yang
Dong, Qiwen
[J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1721 - +
[5] DERP: A Deep Reinforcement Learning Cloud System for Elastic Resource Provisioning
Bitsakos, Constantinos
Konstantinou, Ioannis
Koziris, Nectarios
[J]. 2018 16TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM 2018), 2018, : 21 - 29
[6] Resource Allocation in Uplink NOMA-IoT Networks: A Reinforcement-Learning Approach
Ahsan, Waleed
Yi, Wenqiang
Qin, Zhijin
Liu, Yuanwei
Nallanathan, Arumugam
[J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2021, 20 (08) : 5083 - 5098
[7] REINFORCEMENT LEARNING FOR RESOURCE PROVISIONING IN THE VEHICULAR CLOUD
Salahuddin, Mohammad A.
Al-Fuqaha, Ala
Guizani, Mohsen
[J]. IEEE WIRELESS COMMUNICATIONS, 2016, 23 (04) : 128 - 135
[8] Cloud Resource Scheduling With Deep Reinforcement Learning and Imitation Learning
Guo, Wenxia
Tian, Wenhong
Ye, Yufei
Xu, Lingxiao
Wu, Kui
[J]. IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (05): : 3576 - 3586
[9] Intelligent Cloud Resource Management with Deep Reinforcement Learning
Zhang, Yu
Yao, Jianguo
Guan, Haibing
[J]. IEEE CLOUD COMPUTING, 2017, 4 (06): : 60 - 69
[10] Deep Reinforcement Learning for Intelligent Cloud Resource Management
Zhou, Zhi
Luo, Ke
Chen, Xu
[J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM WKSHPS 2021), 2021,

← 1 2 3 4 5 →