Transparent three-phase Byzantine fault tolerance for parallel and distributed simulations

被引:3
|
作者
Li, Zengxiang [1 ]
Cai, Wentong [2 ]
Turner, Stephen John [2 ]
Qin, Zheng [1 ]
Goh, Rick Siow Mong [1 ]
机构
[1] Inst High Performance Comp, Singapore 138632, Singapore
[2] Nanyang Technol Univ, Singapore 639798, Singapore
关键词
Parallel and distributed simulation; Byzantine fault tolerance; Replication; Checkpoint; Epidemic effect; Time synchronization; MECHANISM;
D O I
10.1016/j.simpat.2015.09.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A parallel and distributed simulation (federation) is composed of a number of simulation components (federates). Since the federates may be developed by different participants and executed on different platforms, they are subject to Byzantine failures. Moreover, the failure may propagate in the federation, resulting in epidemic effect. In this article, a three-phase (i.e., detection, location, and recovery) Byzantine Fault Tolerance (BFT) mechanism is proposed based on a transparent middleware approach. The replication, checkpointing and message logging techniques are integrated in the mechanism for the purpose of enhancing simulation performance and reducing fault tolerance cost. In addition, mechanisms are provided to remove the epidemic effects of Byzantine failures. Our experiments have verified the correctness of the three-phase BFT mechanism and illustrated its high efficiency and good scalability. For some simulation executions, the BFT mechanism may even achieve performance enhancement and Byzantine fault tolerance simultaneously. (C) 2015 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:90 / 107
页数:18
相关论文
共 50 条
  • [1] Parallel Byzantine Fault Tolerance
    Zbierski, Maciej
    SOFT COMPUTING IN COMPUTER AND INFORMATION SCIENCE, 2015, 342 : 321 - 333
  • [2] The robust middleware approach for transparent and systematic fault tolerance in parallel and distributed systems
    Yeh, CH
    2003 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2003, : 61 - 68
  • [3] Open-Switch Fault Tolerance Control of Two Parallel Interleaved Three-Phase Power Converters
    Zeng, Zhiyong
    Zhu, Chong
    Goetz, Stefan M.
    IEEE JOURNAL OF EMERGING AND SELECTED TOPICS IN POWER ELECTRONICS, 2024, 12 (01) : 1030 - 1041
  • [4] Fault Diagnosis and Tolerance of Dual Three-phase PMSM Drives
    Wang, Xueqing
    Wang, Zheng
    Xu, Zhixian
    Cheng, Ming
    2018 IEEE ENERGY CONVERSION CONGRESS AND EXPOSITION (ECCE), 2018, : 325 - 330
  • [5] Open Circuit Fault Diagnosis and Fault Tolerance of Three-Phase Bridgeless Rectifier
    Cheng, Hong
    Chen, Wenbo
    Wang, Cong
    Deng, Jiaqing
    ELECTRONICS, 2018, 7 (11):
  • [6] SAREK: Optimistic Parallel Ordering in Byzantine Fault Tolerance
    Li, Bijun
    Xu, Wenbo
    Abid, Muhammad Zeeshan
    Distler, Tobias
    Kapitza, Ruediger
    2016 12TH EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC 2016), 2016, : 77 - 88
  • [7] A Study on Byzantine Fault Tolerance Methods in Distributed Networks
    Nasreen, M. A.
    Ganesh, Amal
    Sunitha, C.
    FOURTH INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTER SCIENCE & ENGINEERING (ICRTCSE 2016), 2016, 87 : 50 - 54
  • [8] Byzantine fault tolerance in distributed machine learning: a survey
    Bouhata, Djamila
    Moumen, Hamouma
    Mazari, Jocelyn Ahmed
    Bounceur, Ahcene
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2024,
  • [9] Approximate Byzantine Fault-Tolerance in Distributed Optimization
    Liu, Shuo
    Gupta, Nirupam
    Vaidya, Nitin H.
    PROCEEDINGS OF THE 2021 ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING (PODC '21), 2021, : 379 - 389
  • [10] Parallel Distributed Compensation for Three-Phase Pulse Width Modulation Converter
    Saadi R.
    Hammoudi M.Y.
    EEA - Electrotehnica, Electronica, Automatica, 2023, 71 (02): : 38 - 45