A non-blocking Checkpointing algorithm for distributed systems

被引:0
|
作者
Guoliang L. [1 ]
Shuyu C. [1 ]
Xiaoqin Z. [1 ]
机构
[1] College of Computer Science, Chongqing University, Chongqing
关键词
Coordinated Check pointing; Distributed systems; Failure and recovery; Fault tolerance;
D O I
10.4156/jdcta.vol5.issue7.29
中图分类号
学科分类号
摘要
The technology of Check pointing and rollback recovery as an effective method of fault tolerance, has been used widely on the parallel or distributed computer systems. We have presented a nonblocking coordinated Check pointing algorithm for distributed systems, which are differ from the conventional approach of taking first temporary checkpoints and then converting them to permanent ones by processes. The proposed Check pointing algorithm allows processes to take permanent checkpoints directly, without taking temporary checkpoints. The character of the algorithm contributes to its speed of execution. The orphan messages are eliminated by sender processes and the in-transit messages are eliminated by Check pointing interval and retransmission mechanism. While reducing the complexity of control message during gain checkpoints from O(n2) to O(n), the algorithm's controlling messages are reduced to n-1.
引用
收藏
页码:230 / 238
页数:8
相关论文
共 50 条
  • [21] A new non-blocking counter-based coordinated checkpointing algorithm as a migration tool in a high performance dynamic Grid scheduler
    El-Sayed, GA
    Greensheids, IR
    PDPTA '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-3, 2004, : 217 - 223
  • [22] A non-blocking recovery algorithm for causal message logging
    Mitchell, JR
    Garg, VK
    SEVENTEENTH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 1998, : 3 - 9
  • [23] A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling
    Song, Fengguang
    Dongarra, Jack
    Moore, Shirley
    COMPUTATIONAL SCIENCE - ICCS 2009, PART I, 2009, 5544 : 195 - 204
  • [24] Non-blocking disk-tape join algorithm for data on tertiary storage systems
    Liu, B
    Li, JZ
    Nie, L
    Zhang, YQ
    Fifth International Conference on Computer and Information Technology - Proceedings, 2005, : 58 - 64
  • [25] Non-blocking disk-tape join algorithm for data on tertiary storage systems
    School of Computer Science and Technology, Harbin Institute of Technology, China
    Inf. Technol. J., 2006, 1 (159-165):
  • [26] Using computing checkpoints implement consistent low-cost non-blocking coordinated checkpointing
    Men, C
    Yang, XZ
    PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 : 570 - 576
  • [27] Solving Non-Blocking Atomic Commitment Problem in Asynchronous Distributed Systems with Unreliable Failure Detectors
    Park, Sung-Hoon
    Lee, Seon-Hyong
    CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, 2012, 310 : 94 - 102
  • [28] Non-blocking PMD monitoring in live optical systems
    Hui, R.
    Saunders, R.
    Heffner, B.
    Richards, D.
    Fu, B.
    Adany, P.
    ELECTRONICS LETTERS, 2007, 43 (01) : 53 - 54
  • [29] An efficient and scalable checkpointing and recovery algorithm for distributed systems
    Kumar, K. P. Krishna
    Hansdah, R. C.
    DISTRIBUTED COMPUTING AND NETWORKING, PROCEEDINGS, 2006, 4308 : 94 - 99
  • [30] A Non-Blocking Self-Organizing Linked List Algorithm
    Tan, Longfei
    Han, Zhao
    Chen, Chunguang
    He, Yinghua
    Zhang, Kunlong
    2012 13TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS, AND TECHNOLOGIES (PDCAT 2012), 2012, : 71 - 76