On node state reconstruction for fault tolerant distributed algorithms

被引:0
|
作者
Okun, M [1 ]
Barak, A [1 ]
机构
[1] Hebrew Univ Jerusalem, Inst Comp Sci, IL-91904 Jerusalem, Israel
关键词
Distributed algorithms; fault tolerance; state reconstruction; recovery;
D O I
10.1109/RELDIS.2002.1180184
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the main methods for achieving fault tolerance in distributed systems is recovery of the state of failed components. Though generic recovery methods like check-pointing and message logging exist, in many cases the recovery has to be application specific. In this paper we propose a general model for a node state reconstruction after crash failures. In our model the reconstruction operation is defined only by the requirements it fulfills, without referring to the specific application dependent way it is performed. The model provides a framework for formal treatment of algorithm-specific and system-specific recovery procedures. It is used to specify node state reconstruction procedures for several widely used distributed algorithms and systems, as well as to prove their correctness.
引用
收藏
页码:160 / 168
页数:9
相关论文
共 50 条
  • [31] Fault-Tolerant Multi-Agent Optimization: Optimal Iterative Distributed Algorithms
    Su, Lili
    Vaidya, Nitin H.
    PROCEEDINGS OF THE 2016 ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING (PODC'16), 2016, : 425 - 434
  • [32] A Short Counterexample Property for Safety and Liveness Verification of Fault-Tolerant Distributed Algorithms
    Konnov, Igor
    Lazic, Marijana
    Veith, Helmut
    Widder, Josef
    ACM SIGPLAN NOTICES, 2017, 52 (01) : 719 - 734
  • [33] Designing fault tolerant algorithms for reconfigurable
    Fernández-Zepeda, JA
    Estrella-Balderrama, A
    Bourgeois, AG
    INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2005, 16 (01) : 71 - 88
  • [34] Fault tolerant weighted voting algorithms
    Department of Computer Science, University of Nebraska, Omaha, 68182, United States
    不详
    不详
    Int. J. Netw. Secur., 2008, 2 (240-248):
  • [35] Fault Tolerant External Memory Algorithms
    Brodal, Gerth Stolting
    Jorgensen, Allan Gronlund
    Molhave, Thomas
    ALGORITHMS AND DATA STRUCTURES, 2009, 5664 : 411 - 422
  • [36] DEFT: Distributed, Elastic, and Fault-tolerant State Management of Network Functions
    Shahriyar, Md Mahir
    Saha, Gourab
    Bhattacharjee, Bishwajit
    Reaz, Rezwana
    2023 19TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT, CNSM, 2023,
  • [37] Distributed State Estimation and Model Predictive Control : Application to Fault Tolerant Control
    Menighed, Kamel
    Aubrun, Christophe
    Yame, Joseph-Julien
    2009 IEEE INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION, VOLS 1-3, 2009, : 936 - 941
  • [38] Optimal object state transfer - recovery policies for fault tolerant distributed systems
    Katsaros, P
    Lazos, C
    2004 INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2004, : 762 - 771
  • [39] Fault-Tolerant Distributed Reconnaissance
    Lauf, Adrian P.
    Robinson, William H.
    MILITARY COMMUNICATIONS CONFERENCE, 2010 (MILCOM 2010), 2010, : 1812 - 1817
  • [40] Fault tolerant distributed information systems
    Knight, JC
    Elder, MC
    12TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, PROCEEDINGS, 2001, : 132 - 137