On node state reconstruction for fault tolerant distributed algorithms

被引:0
|
作者
Okun, M [1 ]
Barak, A [1 ]
机构
[1] Hebrew Univ Jerusalem, Inst Comp Sci, IL-91904 Jerusalem, Israel
关键词
Distributed algorithms; fault tolerance; state reconstruction; recovery;
D O I
10.1109/RELDIS.2002.1180184
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the main methods for achieving fault tolerance in distributed systems is recovery of the state of failed components. Though generic recovery methods like check-pointing and message logging exist, in many cases the recovery has to be application specific. In this paper we propose a general model for a node state reconstruction after crash failures. In our model the reconstruction operation is defined only by the requirements it fulfills, without referring to the specific application dependent way it is performed. The model provides a framework for formal treatment of algorithm-specific and system-specific recovery procedures. It is used to specify node state reconstruction procedures for several widely used distributed algorithms and systems, as well as to prove their correctness.
引用
收藏
页码:160 / 168
页数:9
相关论文
共 50 条
  • [1] Optimal algorithms for node-to-node fault tolerant routing in hypercubes
    Gu, QP
    Peng, ST
    COMPUTER JOURNAL, 1996, 39 (07): : 626 - 629
  • [2] SPECIAL ISSUE - FAULT-TOLERANT DISTRIBUTED ALGORITHMS
    STRONG, HR
    MATHEMATICAL SYSTEMS THEORY, 1993, 26 (01): : 1 - 1
  • [3] Distal: A Framework for Implementing Fault-tolerant Distributed Algorithms
    Biely, Martin
    Delgado, Pamela
    Milosevic, Zarko
    Schiper, Andre
    2013 43RD ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2013,
  • [4] Redundancy node configuration for distributed fault-tolerant computer system
    Wang, Dongsheng
    Zheng, Weimin
    Yang, Xiaozong
    Xiaoxing Weixing Jisuanji Xitong/Mini-Micro Systems, 19 (01): : 14 - 19
  • [5] EFFICIENT FAULT-TOLERANT ALGORITHMS FOR DISTRIBUTED RESOURCE-ALLOCATION
    CHOY, M
    SINGH, AK
    ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1995, 17 (03): : 535 - 559
  • [6] Optimal algorithms of node-to-node fault tolerant routing in (n,k)-star graph
    Lv, YL
    Xiang, YH
    Zhou, YH
    PROCEEDINGS OF THE 11TH JOINT INTERNATIONAL COMPUTER CONFERENCE, 2005, : 47 - 50
  • [7] Accuracy of Message Counting Abstraction in Fault-Tolerant Distributed Algorithms
    Konnov, Igor
    Widder, Josef
    Spegni, Francesco
    Spalazzi, Luca
    VERIFICATION, MODEL CHECKING, AND ABSTRACT INTERPRETATION, VMCAI 2017, 2017, 10145 : 347 - 366
  • [8] Fault Tolerant Implementation of Peer-to-Peer Distributed Iterative Algorithms
    The Tung Nguyen
    El-Baz, Didier
    15TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2012) / 10TH IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC 2012), 2012, : 137 - 145
  • [9] Parameterized Model Checking of Fault-tolerant Distributed Algorithms by Abstraction
    John, Annu
    Konnov, Igor
    Schmid, Ulrich
    Veith, Helmut
    Widder, Josef
    2013 FORMAL METHODS IN COMPUTER-AIDED DESIGN (FMCAD), 2013, : 201 - 209
  • [10] Fault Tolerant Distributed Routing Algorithms for Mesh Networks-on-Chip
    Lehtonen, Teijo
    Liljeberg, Pasi
    Plosila, Juha
    ISSCS 2009: INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS, VOLS 1 AND 2, PROCEEDINGS,, 2009, : 149 - +