Transparent Fault Tolerance for Stateful Applications in Kubernetes with Checkpoint/Restore

被引:0
|
作者
Schmidt, Henri [1 ]
Rejiba, Zeineb [2 ]
Eidenbenz, Raphael [2 ]
Foerster, Klaus-Tycho [1 ]
机构
[1] TU Dortmund, Dortmund, Germany
[2] Hitachi Energy Res, Dattwil, Switzerland
关键词
fault tolerance; container orchestration; SYSTEM;
D O I
10.1109/SRDS60354.2023.00022
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a solution providing fault tolerance for stateful containerized applications that is transparent, i.e., the application does not require to structure or manage its state in any particular fashion. In the case of faults, such as node crashes or node isolation, the application resumes execution on another node. The solution relies on a Kubernetes operator and a tool to periodically checkpoint containers and restore from the latest checkpoints in case of a node failure. Experimental evaluations reveal the trade-offs between overhead due to checkpointing, i.e., CPU load, memory, network bandwidth, reduced availability, and the performance during recovery, i.e., outage time, state quality. Compared to a nontransparent solution, the transparent solution yields
引用
收藏
页码:129 / 139
页数:11
相关论文
共 50 条
  • [1] An Architecture Proposal for Checkpoint/Restore on Stateful Containers
    Mueller, Rodrigo H.
    Meinhardt, Cristina
    Mendizabal, Odorico M.
    [J]. 37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 267 - 270
  • [2] Transparent fault tolerance for grid applications
    Garbacki, P
    Biskupski, B
    Bal, H
    [J]. ADVANCES IN GRID COMPUTING - EGC 2005, 2005, 3470 : 671 - 680
  • [3] PVA: The Persistent Volume Autoscaler for Stateful Applications in Kubernetes
    Na, Ji-Hyun
    Yu, Hyeon-Jin
    Kang, Hyeongbin
    Kang, Heeju
    Lim, Hee-Dong
    Shin, Jae-Hyuck
    Noh, Seo-Young
    [J]. IEEE Access, 2024, 12 : 179130 - 179143
  • [4] Proactive Stateful Fault-Tolerant System for Kubernetes Containerized Services
    Tran, Minh-Ngoc
    Vu, Xuan Tuong
    Kim, Younghan
    [J]. IEEE ACCESS, 2022, 10 : 102181 - 102194
  • [5] A Kubernetes controller for managing the availability of elastic microservice based stateful applications
    Vayghan, Leila Abdollahi
    Saied, Mohamed Aymen
    Toeroe, Maria
    Khendek, Ferhat
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2021, 175
  • [6] Transparent fault tolerance for parallel applications on networks of workstations
    Scales, DJ
    Lam, MS
    [J]. PROCEEDINGS OF THE USENIX 1996 ANNUAL TECHNICAL CONFERENCE, 1996, : 329 - 341
  • [7] A Study on the Aging and Fault Tolerance of Microservices in Kubernetes
    Flora, Jose
    Goncalves, Paulo
    Teixeira, Miguel
    Antunes, Nuno
    [J]. IEEE ACCESS, 2022, 10 : 132786 - 132799
  • [8] Designing reliable architecture for stateful fault tolerance
    Saha, Indranil
    Mukhopadhyay, Debapriyay
    Banerjee, Satyajit
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2006, : 545 - +
  • [9] Microservice Based Architecture: Towards High-Availability for Stateful Applications with Kubernetes
    Vayghan, Leila Abdollahi
    Saied, Mohamed Aymen
    Toeroe, Maria
    Khendek, Ferhat
    [J]. 2019 IEEE 19TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2019), 2019, : 176 - 185
  • [10] Fault Tolerance of Stateful Microservices for Industrial Edge Scenarios
    Jia, Yuke
    Wang, Tiejun
    Qiu, Tianbo
    Zhang, Xiaohan
    Wang, Rui
    Wo, Tianyu
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING, JCC, 2023, : 50 - 56