On verifying fault tolerance of distributed protocols

被引:0
|
作者
Fisman, Dana [1 ]
Kupferman, Orna [1 ]
Lustig, Yoad [1 ]
机构
[1] Hebrew Univ Jerusalem, Sch Comp Sci & Engn, IL-91904 Jerusalem, Israel
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Distributed systems are composed of processes connected in some network. Distributed systems may suffer from faults: processes may stop, be interrupted, or be maliciously attacked. Fault-tolerant protocols are designed to be resistant to faults. Proving the resistance of protocols to faults is a very challenging problem, as it combines the parameterized setting that distributed systems are based-on, with the need to consider a hostile environment that produces the faults. Considering all the possible fault scenarios for a protocol is very difficult. Thus, reasoning about fault-tolerance protocols utterly needs formal methods. In this paper we describe a framework for verifying the fault tolerance of (synchronous or asynchronous) distributed protocols. In addition to the description of the protocol and the desired behavior, the user provides the fault type (e.g., fail-stop, Byzantine) and its distribution (e.g., at most half of the processes are faulty). Our framework is based on augmenting the description of the configurations of the system by a mask describing which processes are faulty. We focus on regular model checking and show how it is possible to compile the input for the model-checking problem to one that takes the faults and their distribution into an account, and perform regular model-checking on the compiled input. We demonstrate the effectiveness of our framework and argue for its generality.
引用
收藏
页码:315 / 331
页数:17
相关论文
共 50 条
  • [21] LAN DISTRIBUTED FAULT-TOLERANCE
    MIROJULIA, J
    DECENTRALIZED AND DISTRIBUTED SYSTEMS, 1993, 39 : 161 - 174
  • [22] Distributed MapReduce Engine with Fault Tolerance
    Song, Lixing
    Wu, Shaoen
    Wang, Honggang
    Yang, Qing
    2014 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2014, : 3626 - 3630
  • [23] Verifying Reliable Network Components in a Distributed Separation Logic with Dependent Separation Protocols
    Gondelman, Leon
    Hinrichsen, Jonas Kastberg
    Pereira, Mario
    Timany, Amin
    Birkedal, Lars
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2023, 7 (ICFP): : 847 - 877
  • [24] Verifying Fault-tolerance in Parameterised Multi-Agent Systems
    Kouvaros, Panagiotis
    Lomuscio, Alessio
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 288 - 294
  • [25] Self fault-tolerance of protocols: A case study
    Li, Layuan
    Li, Chunlin
    Journal of Systems Engineering and Electronics, 2000, 11 (03) : 28 - 34
  • [26] Self Fault-Tolerance of Protocols: A Case Study
    Li Layuan & Li Chunlin (Dept. of Computer Science & Technology
    JournalofSystemsEngineeringandElectronics, 2000, (03) : 28 - 34
  • [27] Distributed fault tolerance in optimal interpolative nets
    Simon, D
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2001, 12 (06): : 1348 - 1357
  • [28] Fault tolerance in distributed industrial control systems
    Campelo, JC
    Rubio, A
    Rodríguez, F
    Serrano, JJ
    PROCEEDINGS OF THE COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS MODELING AND SIMULATION (CNDS'98), 1998, : 87 - 92
  • [29] Fault Tolerance Model for Hadoop Distributed System
    Ahmed, Soraya Setti
    Slimani, Yahya
    Frefita, Riadh
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2025, 31 (01) : 72 - 92
  • [30] Fault-tolerance in distributed query processing
    Smith, J
    Watson, P
    9TH INTERNATIONAL DATABASE ENGINEERING & APPLICATION SYMPOSIUM, PROCEEDINGS, 2005, : 329 - 338