Microservice Debugging with Checkpoint-Restart

被引:1
|
作者
Merino, Xavier [1 ]
Otero, Carlos E. [1 ]
机构
[1] Florida Inst Technol, Dept Comp Engn & Sci, Melbourne, FL 32901 USA
来源
关键词
checkpointing; debugging; microservices;
D O I
10.1109/CloudSummit57601.2023.00016
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Debugging microservices in complex cloud-native deployments can be a daunting task due to interaction-based problems and challenges in reproducing such environments. Traditional fault localization approaches may be ineffective, leading to longer debugging times. To address these challenges, we propose utilizing checkpoint/restart (C/R) techniques to replicate buggy environments across different hardware configurations without code instrumentation or specialized kernels. Our approach integrates with existing debugging practices, making it adaptable and user-friendly. However, since C/R requires some downtime, we assess our approach's practicality by analyzing data from 13,000 observations and estimating the time required to capture a service's state. The minimal downtime introduced by our approach minimizes service interruption. This can be leveraged by operators to plan deployments, live debugging, maintenance, and game-day operations. By combining the power of C/R techniques with existing debugging practices, we aim to facilitate environment reproduction and reduce the iterative nature of the debugging process in complex cloud-native deployments.
引用
收藏
页码:58 / 63
页数:6
相关论文
共 50 条
  • [31] Checkpoint/Restart in Practice: When 'Simple is Better'
    El-Sayed, Nosayba
    Schroeder, Bianca
    2014 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2014, : 84 - 92
  • [32] CheCUDA: A Checkpoint/Restart Tool for CUDA Applications
    Takizawa, Hiroyuki
    Sato, Katsuto
    Komatsu, Kazuhiko
    Kobayashi, Hiroaki
    2009 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2009), 2009, : 408 - +
  • [33] Interconnect Agnostic Checkpoint/Restart in Open MPI
    Hursey, Joshua
    Mattox, Timothy I.
    Lumsdaine, Andrew
    HPDC'09: 18TH ACM INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, 2009, : 49 - 58
  • [34] Checkpoint and Restart: An Energy Consumption Characterization in Clusters
    Moran, Marina
    Balladini, Javier
    Rexachs, Dolores
    Luque, Emilio
    COMPUTER SCIENCE - CACIC 2018, 2019, 995 : 19 - 33
  • [35] Checkpoint Restart Support for Heterogeneous HPC Applications
    Parasyris, Konstantinos
    Keller, Kai
    Bautista-Gomez, Leonardo
    Unsal, Osman
    2020 20TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2020), 2020, : 242 - 251
  • [36] Prediction of Energy Consumption by Checkpoint/Restart in HPC
    Moran, M.
    Balladini, I
    Rexachs, D.
    Luque, E.
    IEEE ACCESS, 2019, 7 : 71791 - 71803
  • [37] Distributed Speculative Parallelization using Checkpoint Restart
    Ghoshal, Devarshi
    Ramkumar, Sreesudhan R.
    Chauhan, Arun
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 422 - 431
  • [38] Parallel checkpoint/restart without message logging
    Meth, KZ
    Tuel, WG
    2000 INTERNATIONAL WORKSHOPS ON PARALLEL PROCESSING, PROCEEDINGS, 2000, : 253 - 258
  • [39] Berkeley lab checkpoint/restart (BLCR) for Linux clusters
    Hargrove, Paul H.
    Duell, Jason C.
    SCIDAC 2006: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2006, 46 : 494 - 499
  • [40] A model for predicting the optimum checkpoint interval for restart dumps
    Daly, J
    COMPUTATIONAL SCIENCE - ICCS 2003, PT IV, PROCEEDINGS, 2003, 2660 : 3 - 12