A non-blocking Checkpointing algorithm for distributed systems

被引:0
|
作者
Guoliang L. [1 ]
Shuyu C. [1 ]
Xiaoqin Z. [1 ]
机构
[1] College of Computer Science, Chongqing University, Chongqing
关键词
Coordinated Check pointing; Distributed systems; Failure and recovery; Fault tolerance;
D O I
10.4156/jdcta.vol5.issue7.29
中图分类号
学科分类号
摘要
The technology of Check pointing and rollback recovery as an effective method of fault tolerance, has been used widely on the parallel or distributed computer systems. We have presented a nonblocking coordinated Check pointing algorithm for distributed systems, which are differ from the conventional approach of taking first temporary checkpoints and then converting them to permanent ones by processes. The proposed Check pointing algorithm allows processes to take permanent checkpoints directly, without taking temporary checkpoints. The character of the algorithm contributes to its speed of execution. The orphan messages are eliminated by sender processes and the in-transit messages are eliminated by Check pointing interval and retransmission mechanism. While reducing the complexity of control message during gain checkpoints from O(n2) to O(n), the algorithm's controlling messages are reduced to n-1.
引用
收藏
页码:230 / 238
页数:8
相关论文
共 50 条
  • [1] A new non-blocking synchronous checkpointing scheme for distributed systems
    Gupta, B
    Rahimi, S
    Naskar, P
    Proceedings of the ISCA 20th International Conference on Computers and Their Applications, 2005, : 26 - 31
  • [2] FNB: Fast Non-Blocking Coordinated Checkpointing Protocol for Distributed Systems
    Zohra Abdelhafidi
    Mohamed Djoudi
    Nasreddine Lagraa
    Mohamed Bachir Yagoubi
    Theory of Computing Systems, 2015, 57 : 397 - 425
  • [3] FNB: Fast Non-Blocking Coordinated Checkpointing Protocol for Distributed Systems
    Abdelhafidi, Zohra
    Djoudi, Mohamed
    Lagraa, Nasreddine
    Yagoubi, Mohamed Bachir
    THEORY OF COMPUTING SYSTEMS, 2015, 57 (02) : 397 - 425
  • [4] Non-blocking coordinated checkpointing protocol for distributed simulation system
    Liu, Yun-Sheng
    Huang, Jian
    Zha, Ya-Bing
    Xitong Fangzhen Xuebao / Journal of System Simulation, 2007, 19 (01): : 71 - 74
  • [5] On the impossibility of min-process non-blocking checkpointing and an efficient checkpointing algorithm for mobile computing systems
    Cao, GH
    Singhal, M
    1998 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - PROCEEDINGS, 1998, : 37 - 44
  • [6] Design and Modeling of a Non-blocking Checkpointing System
    Sato, Kento
    Mohror, Kathryn
    Moody, Adam
    Gamblin, Todd
    de Supinski, Bronis R.
    Maruyama, Naoya
    Matsuoka, Satoshi
    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,
  • [7] Non-Blocking Atomic Commitment Algorithm in Asynchronous Distributed Systems with Unreliable Failure Detectors
    Park, Sung-Hoon
    Lee, Jea-Yep
    Yu, Su-Chang
    PROCEEDINGS OF THE 2013 10TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2013, : 33 - 38
  • [8] Non-blocking synchronous checkpointing based on rollback-dependency trackability
    Sakata, Tiemi C.
    Garcia, Islene C.
    SRDS 2006: 25TH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2006, : 411 - 420
  • [9] An Adaptive Non-Blocking GVT Algorithm
    Mikida, Eric
    Kale, Laxmikant
    PROCEEDINGS OF THE 2019 ACM SIGSIM CONFERENCE ON PRINCIPLES OF ADVANCED DISCRETE SIMULATION (SIGSIM-PADS'19), 2019, : 25 - 36
  • [10] Modeling and optimization of non-blocking checkpointing for optimistic simulation on myrinet clusters
    Quaglia, F
    Santoro, A
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2005, 65 (06) : 667 - 677