DeepScaling: Autoscaling Microservices With Stable CPU Utilization for Large Scale Production Cloud Systems

被引:0
|
作者
Wang, Ziliang [1 ,2 ,3 ,4 ]
Zhu, Shiyi [5 ]
Li, Jianguo [5 ]
Jiang, Wei [5 ]
Ramakrishnan, K. K. [6 ]
Yan, Meng [7 ]
Zhang, Xiaohong [7 ]
Liu, Alex X. [5 ]
机构
[1] Chongqing Univ, Key Lab Dependable Serv Comp Cyber Phys Soc, Minist Educ, Chongqing 400044, Peoples R China
[2] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 400044, Peoples R China
[3] Peking Univ PKU, Key Lab High Confidence Software Technol HCST, Minist Educ MOE, Beijing 100871, Peoples R China
[4] Peking Univ PKU, Sch Comp Sci SCS, Beijing 100871, Peoples R China
[5] Ant Grp, Hangzhou 310063, Peoples R China
[6] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[7] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 401331, Peoples R China
关键词
Microservices autoscaling; cloud systems; horizontal autoscaling; service quality; RESOURCE-MANAGEMENT; ELASTICITY;
D O I
10.1109/TNET.2024.3400953
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud service providers often provision excessive resources to meet the desired Service Level Objectives (SLOs), by setting lower CPU utilization targets. This can result in a waste of resources and a noticeable increase in power consumption in large-scale cloud deployments. To address this issue, this paper presents DeepScaling, an innovative solution for minimizing resource cost while ensuring SLO requirements are met in a dynamic, large-scale production microservice-based system. We propose DeepScaling, which introduces three innovative components to adaptively refine the target CPU utilization of servers in the data center, and we maintain it at a stable value to meet SLO constraints while using minimum amount of system resources. First, DeepScaling forecasts workloads for each service using a Spatio-temporal Graph Neural Network. Secondly, it estimates CPU utilization with a Deep Neural Network, considering factors such as periodic tasks and traffic. Finally, it uses a modified Deep Q-Network (DQN) to generate an autoscaling policy that controls service resources to maximize service stability while meeting SLOs. Evaluation of DeepScaling in Ant Group's large-scale cloud environment shows that it outperforms state-of-the-art autoscaling approaches in terms of maintaining stable performance and resource savings. The deployment of DeepScaling in the real-world environment of 1900+ microservices saves the provisioning of over 100,000 CPU cores per day, on average.
引用
收藏
页码:3961 / 3976
页数:16
相关论文
共 50 条
  • [1] DeepScaling: Microservices AutoScaling for Stable CPU Utilization in Large Scale Cloud Systems
    Wang, Ziliang
    Zhu, Shiyi
    Li, Jianguo
    Jiang, Wei
    Ramakrishnan, K. K.
    Zheng, Yangfei
    Yan, Meng
    Zhang, Xiaohong
    Liu, Alex X.
    PROCEEDINGS OF THE 13TH SYMPOSIUM ON CLOUD COMPUTING, SOCC 2022, 2022, : 16 - 30
  • [2] Impact of CPU Utilization Thresholds and Scaling Size on Autoscaling Cloud Resources
    Al-Haidari, F.
    Sqalli, M.
    Salah, K.
    2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), VOL 2, 2013, : 256 - 261
  • [3] AWARE: AutomateWorkload Autoscaling with Reinforcement Learning in Production Cloud Systems
    Qiu, Haoran
    Mao, Weichao
    Wang, Chen
    Franke, Hubertus
    Youssef, Alaa
    Kalbarczyk, Zbigniew T.
    Basar, Tamer
    Iyer, Ravishankar K.
    PROCEEDINGS OF THE 2023 USENIX ANNUAL TECHNICAL CONFERENCE, 2023, : 387 - 402
  • [4] JCallGraph: Tracing Microservices in Very Large Scale Container Cloud Platforms
    Liu, Haifeng
    Zhang, Jinjun
    Shan, Huasong
    Li, Min
    Chen, Yuan
    He, Xiaofeng
    Li, Xiaowei
    CLOUD COMPUTING - CLOUD 2019, 2019, 11513 : 287 - 302
  • [5] An Ecosystem for the Large-Scale Reuse of Microservices in a Cloud-Native Context
    Usman, Muhammad
    Badampudi, Deepika
    Smith, Chris
    Nayak, Himansu
    IEEE SOFTWARE, 2022, 39 (05) : 68 - 75
  • [6] Efficient Data Delivery Scheme for Large-Scale Microservices in Distributed Cloud Environment
    Pham, Van-Nam
    Hossain, Md. Delowar
    Lee, Ga-Won
    Huh, Eui-Nam
    APPLIED SCIENCES-BASEL, 2023, 13 (02):
  • [7] Practice of Alibaba cloud on elastic resource provisioning for large-scale microservices cluster
    Xu, Minxian
    Yang, Lei
    Wang, Yang
    Gao, Chengxi
    Wen, Linfeng
    Xu, Guoyao
    Zhang, Liping
    Ye, Kejiang
    Xu, Chengzhong
    SOFTWARE-PRACTICE & EXPERIENCE, 2024, 54 (01): : 39 - 57
  • [8] Varanus: In Situ Monitoring for Large Scale Cloud Systems
    Ward, Jonathan Stuart
    Barker, Adam
    2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), VOL 2, 2013, : 341 - 344
  • [9] Large-Scale Data Analysis on Cloud Systems
    Marozzo, Fabrizio
    Talia, Domenico
    Trunfio, Paolo
    ERCIM NEWS, 2012, (89): : 26 - 27
  • [10] SEMSim Cloud Service: Large-scale urban systems simulation in the cloud
    Zehe, Daniel
    Knoll, Alois
    Cai, Wentong
    Aydt, Heiko
    SIMULATION MODELLING PRACTICE AND THEORY, 2015, 58 : 157 - 171