A Pipelined Multi-level Checkpoint Storage System for Virtual Cluster Checkpointing

被引:0
|
作者
Yaothanee, Jumpol [1 ]
Chanchio, Kasidit [1 ]
机构
[1] Thammasat Univ, Dept Comp Sci, Patum Thanee, Thailand
关键词
Pipelined Checkpoint Storage; Checkpoint-Restart; Fault Tolerance; Cluster Computing; Cloud Computing;
D O I
10.1109/ICCCBDA56900.2023.10154743
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Checkpoint-Restart (CR) is an important mechanism to provide fault tolerance for long-running applications in the cloud. Virtual Cluster (VC) checkpointing is the mechanism to perform CR operations on a cluster of Virtual Machines (VMs). However, saving of state of every VM in a VC directly to a shared storage, such as a SAN storage, simultaneously can cause high checkpoint overheads due to I/O contentions. This paper presents a novel Pipedlined Multi-Level Checkpoint Storage (PMLCS) system that allows users to schedule the saving of VM checkpoints on multiple storage devices to avoid I/O contention. We propose a novel conceptual diagram called the Backup Chain (BC) diagram to describe the scheduled saving and backing up of VM checkpoint files on various storage devices. We have implemented the PMLCS prototype and integrated it with two hypervisor-level VC checkpointing systems. We conducted experiments by checkpointing a VC running an MPI program from NAS parallel benchmark. Experimental results show that the PMLCS system using the BC diagram that represents a combination of SSD and SAN storage is an efficient and scalable checkpoint storage solution.
引用
收藏
页码:239 / 246
页数:8
相关论文
共 50 条
  • [1] Towards Optimal Multi-Level Checkpointing
    Benoit, Anne
    Cavelan, Aurelien
    Le Fevre, Valentin
    Robert, Yves
    Sun, Hongyang
    IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (07) : 1212 - 1226
  • [2] Coalescing and Deduplicating Incremental Checkpoint Files for Restore-Express Multi-Level Checkpointing
    Sigdel, Purushottam
    Tzeng, Nian-Feng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (12) : 2713 - 2727
  • [3] Scalable I/O aggregation for asynchronous multi-level checkpointing
    Gossman M.J.
    Nicolae B.
    Calhoun J.C.
    Future Generation Computer Systems, 2024, 160 : 420 - 432
  • [4] Multi-level checkpointing and silent error detection for linear workflows
    Benoit, Anne
    Cavelan, Aurelien
    Robert, Yves
    Sun, Hongyang
    JOURNAL OF COMPUTATIONAL SCIENCE, 2018, 28 : 398 - 415
  • [5] CloudS: A Multi-Cloud Storage System with Multi-Level Security
    Shen, Lu
    Feng, Shifang
    Sun, Jinjin
    Li, Zhongwei
    Su, Ming
    Wang, Gang
    Liu, Xiaoguang
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (08): : 2036 - 2043
  • [6] Multi-Level Virtual Reality System for Marine Education and Training
    Liu Xiuwen
    Xie Cui
    Jin Yicheng
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL II, 2009, : 1047 - +
  • [7] A Virtual Keyboard System Based on Multi-Level Feature Matching
    Du, Huan
    Charbon, Edoardo
    2008 CONFERENCE ON HUMAN SYSTEM INTERACTIONS, VOLS 1 AND 2, 2008, : 170 - 175
  • [8] Multi-level coupled cluster theory
    Myhre, Rolf H.
    Sanchez de Meras, Alfredo M. J.
    Koch, Henrik
    JOURNAL OF CHEMICAL PHYSICS, 2014, 141 (22):
  • [9] Multi-level metadata management scheme for cloud storage system
    Ko, Y. W. (yuko@hallym.ac.kr), 1600, Science and Engineering Research Support Society, 20 Virginia Court, Sandy Bay, Tasmania, Australia (09):
  • [10] Evaluating Multi-Level Checkpointing for Distributed Deep Neural Network Training
    Anthony, Quentin
    Dai, Donglai
    SCWS 2021: 2021 SC WORKSHOPS SUPPLEMENTARY PROCEEDINGS, 2021, : 60 - 67