Oikonomos-II: A Reinforcement-Learning, Resource-Recommendation System for Cloud HPC

被引:0
|
作者
Betting, J. L. F. [1 ]
De Zeeuw, C. I. [1 ,2 ]
Strydis, C. [1 ,3 ]
机构
[1] Erasmus MC, Dept Neurosci, Rotterdam, Netherlands
[2] Netherlands Inst Neurosci, Amsterdam, Netherlands
[3] Delft Univ Technol, Quantum & Comp Engn Dept, Delft, Netherlands
基金
荷兰研究理事会;
关键词
High-Performance Computing; resource recommendation; cloud computing; prediction; middleware;
D O I
10.1109/HiPC58850.2023.00044
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The cloud has become a powerful and useful environment for the deployment of High-Performance Computing (HPC) applications, but the large number of available instance types poses a challenge in selecting the optimal platform. Users often do not have the time or knowledge necessary to make an optimal choice. Recommender systems have been developed for this purpose but current state-of-the-art systems either require large amounts of training data, or require running the application multiple times; this is costly. In this work, we propose Oikonomos-II, a resource-recommendation system based on reinforcement learning for HPC applications in the cloud. Oikonomos-II models the relationship between different input parameters, instance types, and execution times. The system does not require any preexisting training data or repeated job executions, as it gathers its own training data opportunistically using user-submitted jobs, employing a variant of the Neural-LinUCB algorithm. When deployed on a mix of HPC applications, Oikonomos-II quickly converged towards an optimal policy. The system eliminates the need for preexisting training data or auxiliary runs, providing an economical, general-purpose, resource-recommendation system for cloud HPC.
引用
收藏
页码:266 / 276
页数:11
相关论文
共 50 条
  • [1] Oikonomos: An Opportunistic, Deep-Learning, Resource-Recommendation System for Cloud HPC
    Betting, Jan-Harm
    Liakopoulos, Dimitrios
    Engelen, Max
    Strydis, Christos
    [J]. 2023 IEEE 34TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, ASAP, 2023, : 188 - 196
  • [2] Ensuring Novelty and Transparency in Learning Resource-Recommendation Based on Deep Learning Techniques
    Alkhatib, Wael
    Araache, Eid
    Rensing, Christoph
    Schnitzer, Steffen
    [J]. LIFELONG TECHNOLOGY-ENHANCED LEARNING, EC-TEL 2018, 2018, 11082 : 609 - 612
  • [3] DJ-MC: A Reinforcement-Learning Agent for Music Playlist Recommendation
    Liebman, Elad
    Saar-Tsechansky, Maytal
    Stone, Peter
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15), 2015, : 591 - 599
  • [4] Nonintrusive-Sensing and Reinforcement-Learning Based Adaptive Personalized Music Recommendation
    Hong, Daocheng
    Li, Yang
    Dong, Qiwen
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1721 - +
  • [5] DERP: A Deep Reinforcement Learning Cloud System for Elastic Resource Provisioning
    Bitsakos, Constantinos
    Konstantinou, Ioannis
    Koziris, Nectarios
    [J]. 2018 16TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM 2018), 2018, : 21 - 29
  • [6] Resource Allocation in Uplink NOMA-IoT Networks: A Reinforcement-Learning Approach
    Ahsan, Waleed
    Yi, Wenqiang
    Qin, Zhijin
    Liu, Yuanwei
    Nallanathan, Arumugam
    [J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2021, 20 (08) : 5083 - 5098
  • [7] REINFORCEMENT LEARNING FOR RESOURCE PROVISIONING IN THE VEHICULAR CLOUD
    Salahuddin, Mohammad A.
    Al-Fuqaha, Ala
    Guizani, Mohsen
    [J]. IEEE WIRELESS COMMUNICATIONS, 2016, 23 (04) : 128 - 135
  • [8] Cloud Resource Scheduling With Deep Reinforcement Learning and Imitation Learning
    Guo, Wenxia
    Tian, Wenhong
    Ye, Yufei
    Xu, Lingxiao
    Wu, Kui
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (05): : 3576 - 3586
  • [9] Intelligent Cloud Resource Management with Deep Reinforcement Learning
    Zhang, Yu
    Yao, Jianguo
    Guan, Haibing
    [J]. IEEE CLOUD COMPUTING, 2017, 4 (06): : 60 - 69
  • [10] Deep Reinforcement Learning for Intelligent Cloud Resource Management
    Zhou, Zhi
    Luo, Ke
    Chen, Xu
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM WKSHPS 2021), 2021,