On node state reconstruction for fault tolerant distributed algorithms

被引:0
|
作者
Okun, M [1 ]
Barak, A [1 ]
机构
[1] Hebrew Univ Jerusalem, Inst Comp Sci, IL-91904 Jerusalem, Israel
关键词
Distributed algorithms; fault tolerance; state reconstruction; recovery;
D O I
10.1109/RELDIS.2002.1180184
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the main methods for achieving fault tolerance in distributed systems is recovery of the state of failed components. Though generic recovery methods like check-pointing and message logging exist, in many cases the recovery has to be application specific. In this paper we propose a general model for a node state reconstruction after crash failures. In our model the reconstruction operation is defined only by the requirements it fulfills, without referring to the specific application dependent way it is performed. The model provides a framework for formal treatment of algorithm-specific and system-specific recovery procedures. It is used to specify node state reconstruction procedures for several widely used distributed algorithms and systems, as well as to prove their correctness.
引用
收藏
页码:160 / 168
页数:9
相关论文
共 50 条
  • [21] Fault-tolerant distributed algorithms for autonomous mobile robots with crash faults
    Yoshida, Daisuke
    Masuzawa, Toshimitsu
    Fujiwara, Hideo
    Systems and Computers in Japan, 1997, 28 (02): : 33 - 43
  • [22] State-efficient realization of fault-tolerant FSSP algorithms
    Umeo, Hiroshi
    Kamikawa, Naoki
    Maeda, Masashi
    Fujita, Gen
    NATURAL COMPUTING, 2019, 18 (04) : 827 - 844
  • [23] State-efficient realization of fault-tolerant FSSP algorithms
    Hiroshi Umeo
    Naoki Kamikawa
    Masashi Maeda
    Gen Fujita
    Natural Computing, 2019, 18 : 827 - 844
  • [24] Fault tolerant distributed simulation
    Lin, Yi-Bing
    Journal of Information Science and Engineering, 1994, 10 (02) : 259 - 269
  • [25] Distributed Fault Tolerant Controllers
    Mostarda, Leonardo
    Ball, Rudi
    Dulay, Naranker
    DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, PROCEEDINGS, 2010, 6115 : 141 - 154
  • [26] Fault tolerant distributed power
    Shi, F
    Brockschmidt, A
    APEC '96 - ELEVENTH ANNUAL APPLIED POWER ELECTRONICS CONFERENCE AND EXPOSITIONS, VOLS 1 & 2, CONFERENCE PROCEEDINGS, 1996, : 671 - 677
  • [27] Sensor Fault-Tolerant State Estimation by Networks of Distributed Observers
    Yang, Guitao
    Rezaee, Hamed
    Serrani, Andrea
    Parisini, Thomas
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (10) : 5348 - 5360
  • [28] An essential design pattern for fault-tolerant distributed state sharing
    Islam, N
    Devarakonda, M
    COMMUNICATIONS OF THE ACM, 1996, 39 (10) : 65 - 74
  • [29] Fault-tolerant relay node placement in wireless sensor networks: Problems and algorithms
    Zhang, Weiyi
    Xue, Guoliang
    Misra, Satyajayant
    INFOCOM 2007, VOLS 1-5, 2007, : 1649 - +
  • [30] Easily rendering token-ring algorithms of distributed and parallel applications fault tolerant
    Arantes, Luciana
    Sopena, Julien
    2013 25TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2013, : 206 - 213