Microservice Debugging with Checkpoint-Restart

被引:1
|
作者
Merino, Xavier [1 ]
Otero, Carlos E. [1 ]
机构
[1] Florida Inst Technol, Dept Comp Engn & Sci, Melbourne, FL 32901 USA
来源
关键词
checkpointing; debugging; microservices;
D O I
10.1109/CloudSummit57601.2023.00016
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Debugging microservices in complex cloud-native deployments can be a daunting task due to interaction-based problems and challenges in reproducing such environments. Traditional fault localization approaches may be ineffective, leading to longer debugging times. To address these challenges, we propose utilizing checkpoint/restart (C/R) techniques to replicate buggy environments across different hardware configurations without code instrumentation or specialized kernels. Our approach integrates with existing debugging practices, making it adaptable and user-friendly. However, since C/R requires some downtime, we assess our approach's practicality by analyzing data from 13,000 observations and estimating the time required to capture a service's state. The minimal downtime introduced by our approach minimizes service interruption. This can be leveraged by operators to plan deployments, live debugging, maintenance, and game-day operations. By combining the power of C/R techniques with existing debugging practices, we aim to facilitate environment reproduction and reduce the iterative nature of the debugging process in complex cloud-native deployments.
引用
收藏
页码:58 / 63
页数:6
相关论文
共 50 条
  • [21] Leveraging the checkpoint-restart technique for optimizing CPU efficiency of ATLAS production applications on opportunistic platforms
    Cameron, D.
    Elmsheuser, J.
    Heinrich, L.
    Lavrijsen, W.
    Nilsson, P.
    Tsulaia, V.
    Vogel, M.
    18TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH (ACAT2017), 2018, 1085
  • [22] An unsupervised machine-learning checkpoint-restart algorithm using Gaussian mixtures for particle-in-cell simulations
    Chen, G.
    Chacon, L.
    Nguyen, T. B.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2021, 436
  • [23] Delta Debugging Microservice Systems
    Zhou, Xiang
    Peng, Xin
    Xie, Tao
    Sun, Jun
    Li, Wenhai
    Ji, Chao
    Ding, Dan
    PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18), 2018, : 802 - 807
  • [24] Delta Debugging Microservice Systems with Parallel Optimization
    Zhou, Xiang
    Peng, Xin
    Xie, Tao
    Sun, Jun
    Ji, Chao
    Li, Wenhai
    Ding, Dan
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (01) : 16 - 29
  • [25] FREM: A Fast Restart Mechanism for General Checkpoint/Restart
    Li, Yawei
    Lan, Zhiling
    IEEE TRANSACTIONS ON COMPUTERS, 2011, 60 (05) : 639 - 652
  • [26] PROGRAM FAIL RESTART FOR MICROS WITH DEBUGGING SWITCH
    DAVIES, ME
    ELECTRONIC ENGINEERING, 1980, 52 (633): : 25 - 25
  • [27] Optimizing Checkpoint Restart with Data Deduplication
    Chen, Zhengyu
    Sun, Jianhua
    Chen, Hao
    SCIENTIFIC PROGRAMMING, 2016, 2016
  • [28] Affinity-Aware Checkpoint Restart
    Saini, Ajay
    Rezaei, Arash
    Mueller, Frank
    Hargrove, Paul
    Roman, Eric
    ACM/IFIP/USENIX MIDDLEWARE 2014, 2014, : 121 - 132
  • [29] Efficient checkpoint/Restart of CUDA applications
    Nukada, Akira
    Suzuki, Taichiro
    Matsuoka, Satoshi
    PARALLEL COMPUTING, 2023, 116
  • [30] A Flexible Checkpoint/Restart Model in Distributed Systems
    Bouguerra, Mohamed-Slim
    Gautier, Thierry
    Trystram, Denis
    Vincent, Jean-Marc
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I, 2010, 6067 : 206 - +