On node state reconstruction for fault tolerant distributed algorithms

被引:0
|
作者
Okun, M [1 ]
Barak, A [1 ]
机构
[1] Hebrew Univ Jerusalem, Inst Comp Sci, IL-91904 Jerusalem, Israel
关键词
Distributed algorithms; fault tolerance; state reconstruction; recovery;
D O I
10.1109/RELDIS.2002.1180184
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the main methods for achieving fault tolerance in distributed systems is recovery of the state of failed components. Though generic recovery methods like check-pointing and message logging exist, in many cases the recovery has to be application specific. In this paper we propose a general model for a node state reconstruction after crash failures. In our model the reconstruction operation is defined only by the requirements it fulfills, without referring to the specific application dependent way it is performed. The model provides a framework for formal treatment of algorithm-specific and system-specific recovery procedures. It is used to specify node state reconstruction procedures for several widely used distributed algorithms and systems, as well as to prove their correctness.
引用
收藏
页码:160 / 168
页数:9
相关论文
共 50 条
  • [41] Fault tolerant and distributed broadcast encryption
    D'Arco, P
    Stinson, DR
    TOPICS IN CRYPTOLOGY - CT-RSA 2003, PROCEEDINGS, 2003, 2612 : 263 - 280
  • [42] FAULT TOLERANT DISTRIBUTED MAJORITY COMMITMENT
    BARYEHUDA, R
    KUTTEN, S
    JOURNAL OF ALGORITHMS, 1988, 9 (04) : 568 - 582
  • [43] Fault-tolerant distributed simulation
    Damani, OP
    Garg, VK
    TWELFTH WORKSHOP ON PARALLEL AND DISTRIBUTED SIMULATION - PADS'98, PROCEEDINGS, 1998, : 38 - 45
  • [44] Node-to-node cluster fault tolerant routing in hypercubes
    Gu, QP
    Peng, ST
    THIRD INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS, AND NETWORKS, PROCEEDINGS (I-SPAN '97), 1997, : 404 - 409
  • [45] OPTIMAL FAULT-TOLERANT DISTRIBUTED ALGORITHMS FOR ELECTION IN COMPLETE NETWORKS WITH A GLOBAL SENSE OF DIRECTION
    MASUZAWA, T
    NISHIKAWA, N
    HAGIHARA, K
    TOKURA, N
    LECTURE NOTES IN COMPUTER SCIENCE, 1989, 392 : 171 - 182
  • [46] Checkpointing Algorithms for Fault-Tolerant Execution of Large-Scale Distributed Applications in Cloud
    Kumari, Priti
    Kaur, Parmeet
    WIRELESS PERSONAL COMMUNICATIONS, 2021, 117 (03) : 1853 - 1877
  • [47] Reconciling Fault-Tolerant Distributed Algorithms and Real-Time Computing (Extended Abstract)
    Moser, Heinrich
    Schmid, Ulrich
    STRUCTURAL INFORMATION AND COMMUNICATION COMPLEXITY, 2011, 6796 : 42 - 53
  • [48] Distributed fault-tolerant and auto-healing algorithms on dual-ring networks
    Ko, KW
    Lam, SF
    Yeung, CT
    Lam, WK
    Cheung, KW
    NETWORKS: THE NEXT MILLENNINUM - THE IEEE SINGAPORE INTERNATIONAL CONFERENCE ON NETWORKS 1997, IEEE SICON'97, 1997, : 471 - 485
  • [49] Checkpointing Algorithms for Fault-Tolerant Execution of Large-Scale Distributed Applications in Cloud
    Priti Kumari
    Parmeet Kaur
    Wireless Personal Communications, 2021, 117 : 1853 - 1877
  • [50] Distributed Bayesian algorithms for fault-tolerant event region detection in wireless sensor networks
    Krishnamachari, B
    Iyengar, S
    IEEE TRANSACTIONS ON COMPUTERS, 2004, 53 (03) : 241 - 250