Transparent Fault Tolerance for Stateful Applications in Kubernetes with Checkpoint/Restore

被引:0
|
作者
Schmidt, Henri [1 ]
Rejiba, Zeineb [2 ]
Eidenbenz, Raphael [2 ]
Foerster, Klaus-Tycho [1 ]
机构
[1] TU Dortmund, Dortmund, Germany
[2] Hitachi Energy Res, Dattwil, Switzerland
关键词
fault tolerance; container orchestration; SYSTEM;
D O I
10.1109/SRDS60354.2023.00022
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a solution providing fault tolerance for stateful containerized applications that is transparent, i.e., the application does not require to structure or manage its state in any particular fashion. In the case of faults, such as node crashes or node isolation, the application resumes execution on another node. The solution relies on a Kubernetes operator and a tool to periodically checkpoint containers and restore from the latest checkpoints in case of a node failure. Experimental evaluations reveal the trade-offs between overhead due to checkpointing, i.e., CPU load, memory, network bandwidth, reduced availability, and the performance during recovery, i.e., outage time, state quality. Compared to a nontransparent solution, the transparent solution yields
引用
收藏
页码:129 / 139
页数:11
相关论文
共 50 条
  • [31] Incorporating fault tolerance in distributed applications
    Ouyang, J
    Maheshwari, P
    [J]. PROCEEDINGS OF THE 21ST AUSTRALASIAN COMPUTER SCIENCE CONFERENCE, ACSC'98, 1998, 20 (01): : 121 - 132
  • [32] Demonstration of fault tolerance for CORBA applications
    Moser, L
    Melliar-Smith, M
    [J]. DARPA INFORMATION SURVIVABILITY CONFERENCE AND EXPOSITION, VOL II, PROCEEDINGS, 2003, : 87 - 89
  • [33] A Fault Tolerance Approach for Enterprise Applications
    Ermagan, Vina
    Krueger, Ingolf
    Menarini, Massimiliano
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING, PROCEEDINGS, VOL 2, 2008, : 63 - 72
  • [34] Byzantine fault tolerance for nondeterministic applications
    Zhao, Weribing
    [J]. DASC 2007: THIRD IEEE INTERNATIONAL SYMPOSIUM ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, PROCEEDINGS, 2007, : 108 - 115
  • [35] Strategies for Fault Tolerance in Multicomponent Applications
    Shet, Aniruddha G.
    Elwasif, Wael R.
    Foley, Samantha S.
    Park, Byung H.
    Bernholdt, David E.
    Bramley, Randall
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 2287 - 2296
  • [36] Transparent checkpoint-restart of distributed applications on commodity clusters
    Laadan, Oren
    Phung, Dan
    Nieh, Jason
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2006, : 52 - +
  • [37] TFT: A software system for application-transparent fault tolerance
    Bressoud, TC
    [J]. TWENTY-EIGHTH ANNUAL INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT COMPUTING, DIGEST PAPERS, 1998, : 128 - 137
  • [38] Efficitent client-transparent fault tolerance for video conferencing
    Aghdaie, N
    Tamir, Y
    [J]. PROCEEDINGS OF THE THIRD IASTED INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND COMPUTER NETWORKS, 2005, : 202 - 207
  • [39] Evaluating Kubernetes at the Edge for Fault Tolerant Multi-Camera Computer Vision Applications
    Heckmann, Owen
    Ravindran, Arun
    [J]. 2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING WORKSHOPS, CCGRIDW, 2023, : 269 - 271
  • [40] Managing the Stateful Applications for High Availability using Novel Kubernetes based Microservice Architecture over Cloud based Azure Virtualization Architecture
    Kumar, Sai Vimal V.
    Malathi, K.
    [J]. JOURNAL OF PHARMACEUTICAL NEGATIVE RESULTS, 2022, 13 : 1536 - 1547