Oikonomos-II: A Reinforcement-Learning, Resource-Recommendation System for Cloud HPC

被引:0
|
作者
Betting, J. L. F. [1 ]
De Zeeuw, C. I. [1 ,2 ]
Strydis, C. [1 ,3 ]
机构
[1] Erasmus MC, Dept Neurosci, Rotterdam, Netherlands
[2] Netherlands Inst Neurosci, Amsterdam, Netherlands
[3] Delft Univ Technol, Quantum & Comp Engn Dept, Delft, Netherlands
基金
荷兰研究理事会;
关键词
High-Performance Computing; resource recommendation; cloud computing; prediction; middleware;
D O I
10.1109/HiPC58850.2023.00044
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The cloud has become a powerful and useful environment for the deployment of High-Performance Computing (HPC) applications, but the large number of available instance types poses a challenge in selecting the optimal platform. Users often do not have the time or knowledge necessary to make an optimal choice. Recommender systems have been developed for this purpose but current state-of-the-art systems either require large amounts of training data, or require running the application multiple times; this is costly. In this work, we propose Oikonomos-II, a resource-recommendation system based on reinforcement learning for HPC applications in the cloud. Oikonomos-II models the relationship between different input parameters, instance types, and execution times. The system does not require any preexisting training data or repeated job executions, as it gathers its own training data opportunistically using user-submitted jobs, employing a variant of the Neural-LinUCB algorithm. When deployed on a mix of HPC applications, Oikonomos-II quickly converged towards an optimal policy. The system eliminates the need for preexisting training data or auxiliary runs, providing an economical, general-purpose, resource-recommendation system for cloud HPC.
引用
收藏
页码:266 / 276
页数:11
相关论文
共 50 条
  • [31] Enhanced entropy based reinforcement learning hotel recommendation system
    Jose, G. Jai Arul
    Alajmi, Qasim
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024,
  • [32] A Combinatorial Recommendation System Framework Based on Deep Reinforcement Learning
    Zhou, Fei
    Luo, Biao
    Hu, Tianmeng
    Chen, Zihan
    Wen, Yilin
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5733 - 5740
  • [33] A social image recommendation system based on deep reinforcement learning
    Ahmadkhani, Somaye
    Moghaddam, Mohsen Ebrahimi
    [J]. PLOS ONE, 2024, 19 (04):
  • [34] Automatically Reconfigurable Optical Network for HPC System Based on Deep Reinforcement Learning
    Shang, Yu
    Guo, Xingwen
    Guo, Bingli
    Wang, Haixi
    Xiao, Jie
    Huang, Shanguo
    [J]. 2022 ASIA COMMUNICATIONS AND PHOTONICS CONFERENCE, ACP, 2022, : 1163 - 1167
  • [35] Deep Reinforcement Learning Enhanced Greedy Optimization for Online Scheduling of Batched Tasks in Cloud HPC Systems
    Yang, Yuanhao
    Shen, Hong
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 3003 - 3014
  • [36] Visual Analysis of the Research and Application of Learning Resource Recommendation System
    Liu, Qingtang
    Zheng, Xinxin
    Zhang, Ni
    Luo, Lei
    Xu, Biao
    Liu, Mengfan
    [J]. 2020 INTERNATIONAL SYMPOSIUM ON EDUCATIONAL TECHNOLOGY (ISET 2020), 2020, : 249 - 252
  • [37] Resource allocation of English intelligent learning system based on reinforcement learning
    Jin Jingbo
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (04) : 6839 - 6852
  • [38] Computational Resource Sharing in a Vehicular Cloud Network via Deep Reinforcement Learning
    Xu, Shilin
    Guo, Caili
    Hu, Rose Qingyang
    Qian, Yi
    [J]. 2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [39] Applying reinforcement learning towards automating resource allocation and application scalability in the cloud
    Barrett, Enda
    Howley, Enda
    Duggan, Jim
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2013, 25 (12): : 1656 - 1674
  • [40] Reinforcement Learning-Based Resource Partitioning for Improving Responsiveness in Cloud Gaming
    Li, Yusen
    Wang, Xiwei
    Liu, Haoyuan
    Pu, Lingjun
    Tang, Shanjiang
    Wang, Gang
    Liu, Xiaoguang
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (05) : 1049 - 1062