HephaestusForge: Optimal microservice deployment across the Compute Continuum via Reinforcement Learning

被引:0
|
作者
Santos, Jose [1 ]
Zaccarini, Mattia [2 ]
Poltronieri, Filippo [2 ]
Tortonesi, Mauro [2 ]
Stefanelli, Cesare [2 ]
Di Cicco, Nicola [3 ]
De Turck, Filip [1 ]
机构
[1] Univ Ghent, Dept Informat Technol, IDLab, Imec, Technol Pk Zwijnaarde 126, B-9052 Ghent, Belgium
[2] Univ Ferrara, Distributed Syst Res Grp, Ferrara, Italy
[3] Politecn Milan, Dept Elect Informat & Bioengn DEIB, Milan, Italy
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2025年 / 166卷
关键词
Kubernetes; Orchestration; Microservices; Reinforcement Learning; Resource allocation; Compute Continuum; SERVICE FUNCTION CHAIN; CLOUD; ORCHESTRATION;
D O I
10.1016/j.future.2024.107680
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the advent of containerization technologies, microservices have revolutionized application deployment by converting old monolithic software into a group of loosely coupled containers, aiming to offer greater flexibility and improve operational efficiency. This transition made applications more complex, consisting of tens to hundreds of microservices. Designing effective orchestration mechanisms remains a crucial challenge, especially for emerging distributed cloud paradigms such as the Compute Continuum (CC). Orchestration across multiple clusters is still not extensively explored in the literature since most works consider single- cluster scenarios. In the CC scenario, the orchestrator must decide the optimal locations for each microservice, deciding whether instances are deployed altogether or placed across different clusters, significantly increasing orchestration complexity. This paper addresses orchestration in a containerized CC environment by studying a Reinforcement Learning (RL) approach for efficient microservice deployment in Kubernetes (K8s) clusters, a widely adopted container orchestration platform. This work demonstrates the effectiveness of RL in achieving near-optimal deployment schemes under dynamic conditions, where network latency and resource capacity fluctuate. We extensively evaluate a multi-objective reward function that aims to minimize overall latency, reduce deployment costs, and promote fair distribution of microservice instances, and we compare it against typical heuristic-based approaches. The results from an implemented OpenAI Gym framework, named as HephaestusForge, show that RL algorithms achieve minimal rejection rates (as low as 0.002%, 90x less than the baseline Karmada scheduler). Cost-aware strategies result in lower deployment costs (2.5 units), and latency- aware functions achieve lower latency (268-290 ms), improving by 1.5x and 1.3x, respectively, over the best-performing baselines. HephaestusForge is available in a public open-source repository, allowing researchers to validate their own placement algorithms. This study also highlights the adaptability of the DeepSets (DS) neural network in optimizing microservice placement across diverse multi-cluster setups without retraining. The DS neural network can handle inputs and outputs as arbitrarily sized sets, enabling the RL algorithm to learn a policy not bound to a fixed number of clusters.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Joint Optimization of Microservice Deployment and Routing in Edge via Multi-Objective Deep Reinforcement Learning
    Hu, Menglan
    Wang, Hao
    Xu, Xiaohui
    He, Jianwen
    Hu, Yi
    Deng, Tianping
    Peng, Kai
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (06): : 6364 - 6381
  • [2] Delay-Aware Optimization of Fine-Grained Microservice Deployment and Routing in Edge via Reinforcement Learning
    Peng, Kai
    He, Jintao
    Guo, Jialu
    Liu, Yuan
    He, Jianwen
    Liu, Wei
    Hu, Menglan
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (06): : 6024 - 6037
  • [3] Efficient Microservice Deployment in Kubernetes Multi-Clusters through Reinforcement Learning
    Santos, Jose
    Zaccarini, Mattia
    Poltronieri, Filippo
    Tortonesi, Mauro
    Stefanelli, Cesare
    Di Cicco, Nicola
    de Turck, Filip
    PROCEEDINGS OF 2024 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, NOMS 2024, 2024,
  • [4] Deep Reinforcement Learning for Dependency-aware Microservice Deployment in Edge Computing
    Wang, Chenyang
    Jia, Bosen
    Yu, Hao
    Li, Xiuhua
    Wang, Xiaofei
    Taleb, Tarik
    2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 5141 - 5146
  • [5] A Deep Reinforcement Learning Approach to Online Microservice Deployment in Mobile Edge Computing
    Zhao, Yuqi
    Wang, Jian
    Li, Bing
    SERVICE-ORIENTED COMPUTING, ICSOC 2023, PT II, 2023, 14420 : 127 - 142
  • [6] IoT Microservice Deployment in Edge-Cloud Hybrid Environment Using Reinforcement Learning
    Chen, Lulu
    Xu, Yangchuan
    Lu, Zhihui
    Wu, Jie
    Gai, Keke
    Hung, Patrick C. K.
    Qiu, Meikang
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (16): : 12610 - 12622
  • [7] FIGARO: reinForcement learnInG mAnagement acRoss the computing cOntinuum
    Filippini, Federica
    Cavadini, Riccardo
    Ardagna, Danilo
    Lancellotti, Riccardo
    Russo, Gabriele Russo
    Cardellini, Valeria
    Lo Presti, Francesco
    16TH IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, UCC 2023, 2023,
  • [8] Graph-Reinforcement-Learning-Based Dependency-Aware Microservice Deployment in Edge Computing
    Lv, Wenkai
    Yang, Pengfei
    Zheng, Tianyang
    Lin, Chengmin
    Wang, Zhenyi
    Deng, Minwen
    Wang, Quan
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (01) : 1604 - 1615
  • [9] Graph Neural Network Aided Deep Reinforcement Learning for Microservice Deployment in Cooperative Edge Computing
    Chen, Shuangwu
    Yuan, Qifeng
    Li, Jiangming
    He, Huasen
    Li, Sen
    Jiang, Xiaofeng
    Yang, Jian
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (06) : 3742 - 3757
  • [10] Efficient Microservice Deployment in the Edge-Cloud Networks With Policy-Gradient Reinforcement Learning
    Afachao, Kevin
    Abu-Mahfouz, Adnan M.
    Hanke Jr, Gerhard P.
    IEEE ACCESS, 2024, 12 : 133110 - 133124