Customizable fault tolerance for wide-area replication

被引:15
|
作者
Amir, Yair [1 ]
Coan, Brian [2 ]
Kirsch, Jonathan [1 ]
Lane, John [1 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Telcordia Technol, Piscataway, NJ USA
来源
SRDS 2007: 26TH IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS | 2007年
基金
美国国家科学基金会;
关键词
D O I
10.1109/SRDS.2007.29
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Constructing logical machines out of collections of physical machines is a well-known technique for improving the robustness and fault tolerance of distributed systems. We present a new, scalable replication architecture, built upon logical machines specifically designed to perform well in wide-area systems spanning multiple sites. The physical machines in each site implement a logical machine by running a local state machine replication protocol, and a wide-area replication protocol runs among the logical machines. Implementing logical machines via the state machine approach affords free substitution of the fault tolerance method used in each site and in the wide-area replication protocol, allowing one to balance performance and fault tolerance based on perceived risk. We present a new Byzantine fault-tolerant protocol that establishes a reliable virtual communication link between logical machines. Our communication protocol is efficient (a necessity in wide-area environments), avoiding the need for redundant message sending during normal-case operation and allowing a logical machine to consume approximately the same wide-area bandwidth as a single physical machine. This dramatically improves the wide-area performance of our system compared to existing logical machine based approaches. We implemented a prototype system and compare its performance and fault tolerance to existing solutions.
引用
收藏
页码:66 / +
页数:3
相关论文
共 50 条
  • [41] Analysis of power system wide-area blackout based on the fault cascading scenarios
    Trans. Korean Inst. Electr. Eng., 2008, 2 (155-163):
  • [42] Wide-Area Backup Protection Algorithm Based on Fault Component Voltage Distribution
    He, Zhiqin
    Zhang, Zhe
    Chen, Wei
    Malik, Om P.
    Yin, Xianggen
    IEEE TRANSACTIONS ON POWER DELIVERY, 2011, 26 (04) : 2752 - 2760
  • [43] Wide-area back-up protection based on fault correlation factor
    Ma, Jing
    Li, Jinlong
    Wang, Zengping
    Yang, Qixun
    Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2010, 30 (31): : 100 - 107
  • [44] Smart Grid Fault Mitigation and Cybersecurity with Wide-Area Measurement Systems: A Review
    Ogbogu, Chisom E.
    Thornburg, Jesse
    Okozi, Samuel O.
    ENERGIES, 2025, 18 (04)
  • [45] WIDE-AREA BINARY PAGING
    TRIDGELL, RH
    ELECTRONICS & WIRELESS WORLD, 1987, 93 (1615): : 502 - 502
  • [46] WIDE-AREA BINARY PAGING
    KIRBY, JC
    ELECTRONICS & WIRELESS WORLD, 1986, 92 (1609): : 12 - &
  • [47] The design and implementation of a customizable fault tolerance framework
    Yen, IL
    Ahmed, I
    Jagannath, R
    Kundu, S
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 1999, 9 (02) : 181 - 202
  • [48] Improving Wide-area Replication Performance through Informed Leader Election and Overlay Construction
    Ejaz, Syed Kewaan
    Behrens, Diogo
    Knauth, Thomas
    Fetzer, Christof
    2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013), 2013, : 422 - 429
  • [49] An Agent Concept for High Fault Tolerance Wide Area Protection
    Wang, Yangguang
    Yin, Xianggen
    You, Dahai
    INTERNATIONAL JOURNAL OF EMERGING ELECTRIC POWER SYSTEMS, 2011, 12 (02):
  • [50] A Wide-Area Scheme for Power System Fault Location Incorporating Bad Data Detection
    Dobakhshari, Ahmad Salehi
    Ranjbar, Ali Mohammad
    IEEE TRANSACTIONS ON POWER DELIVERY, 2015, 30 (02) : 800 - 808