On verifying fault tolerance of distributed protocols

被引:0
|
作者
Fisman, Dana [1 ]
Kupferman, Orna [1 ]
Lustig, Yoad [1 ]
机构
[1] Hebrew Univ Jerusalem, Sch Comp Sci & Engn, IL-91904 Jerusalem, Israel
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Distributed systems are composed of processes connected in some network. Distributed systems may suffer from faults: processes may stop, be interrupted, or be maliciously attacked. Fault-tolerant protocols are designed to be resistant to faults. Proving the resistance of protocols to faults is a very challenging problem, as it combines the parameterized setting that distributed systems are based-on, with the need to consider a hostile environment that produces the faults. Considering all the possible fault scenarios for a protocol is very difficult. Thus, reasoning about fault-tolerance protocols utterly needs formal methods. In this paper we describe a framework for verifying the fault tolerance of (synchronous or asynchronous) distributed protocols. In addition to the description of the protocol and the desired behavior, the user provides the fault type (e.g., fail-stop, Byzantine) and its distribution (e.g., at most half of the processes are faulty). Our framework is based on augmenting the description of the configurations of the system by a mask describing which processes are faulty. We focus on regular model checking and show how it is possible to compile the input for the model-checking problem to one that takes the faults and their distribution into an account, and perform regular model-checking on the compiled input. We demonstrate the effectiveness of our framework and argue for its generality.
引用
收藏
页码:315 / 331
页数:17
相关论文
共 50 条
  • [31] Distributed Control and Communication Fault Tolerance for the CKBot
    Park, Michael
    Yim, Mark
    RECONFIGURABLE MECHANISMS AND ROBOTS, 2009, : 673 - 679
  • [32] Communication fault tolerance in distributed robotic systems
    Molnár, P
    Starke, J
    DISTRIBUTED AUTONOMOUS ROBOTIC SYSTEMS, 2000, : 99 - 108
  • [33] Dynamic fault tolerance in distributed simulation system
    Ma, Min
    Jin, Shiyao
    Ye, Chaoqun
    Liu, Xiaojian
    COMPUTATIONAL SCIENCE - ICCS 2006, PT 1, PROCEEDINGS, 2006, 3991 : 769 - 776
  • [34] Fault Tolerance Communication in Mobile Distributed Networks
    Suganth, D. Bhuvana
    Manjunath, R.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT 2016, VOL 1, 2017, 468 : 77 - 87
  • [35] Fault tolerance for distributed process control system
    Takizawa, H
    SICE 2002: PROCEEDINGS OF THE 41ST SICE ANNUAL CONFERENCE, VOLS 1-5, 2002, : 3259 - 3263
  • [36] Flexible fault tolerance in distributed enterprise communities
    Ionescu, Mihail
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2012, 3 (04) : 224 - 232
  • [37] Optimizing fault tolerance in embedded distributed systems
    Draber, S
    IEEE MICRO, 2000, 20 (04) : 76 - 84
  • [38] Flexible Fault Tolerance in Distributed Enterprise Communities
    Ionescu, Mihail
    12TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2010), 2011, : 278 - 285
  • [39] Ensuring fault-tolerance in distributed media
    Tormasov, A.G.
    Khasin, M.A.
    Pakhomov, Yu.I.
    1600, Nauka Moscow (27): : 26 - 35
  • [40] Fault tolerance in a distributed CHORUS/MiX system
    Kittur, S
    Steel, D
    Armand, F
    Lipkis, J
    PROCEEDINGS OF THE USENIX 1996 ANNUAL TECHNICAL CONFERENCE, 1996, : 219 - 228