Customizable fault tolerance for wide-area replication

被引:15
|
作者
Amir, Yair [1 ]
Coan, Brian [2 ]
Kirsch, Jonathan [1 ]
Lane, John [1 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Telcordia Technol, Piscataway, NJ USA
来源
SRDS 2007: 26TH IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS | 2007年
基金
美国国家科学基金会;
关键词
D O I
10.1109/SRDS.2007.29
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Constructing logical machines out of collections of physical machines is a well-known technique for improving the robustness and fault tolerance of distributed systems. We present a new, scalable replication architecture, built upon logical machines specifically designed to perform well in wide-area systems spanning multiple sites. The physical machines in each site implement a logical machine by running a local state machine replication protocol, and a wide-area replication protocol runs among the logical machines. Implementing logical machines via the state machine approach affords free substitution of the fault tolerance method used in each site and in the wide-area replication protocol, allowing one to balance performance and fault tolerance based on perceived risk. We present a new Byzantine fault-tolerant protocol that establishes a reliable virtual communication link between logical machines. Our communication protocol is efficient (a necessity in wide-area environments), avoiding the need for redundant message sending during normal-case operation and allowing a logical machine to consume approximately the same wide-area bandwidth as a single physical machine. This dramatically improves the wide-area performance of our system compared to existing logical machine based approaches. We implemented a prototype system and compare its performance and fault tolerance to existing solutions.
引用
收藏
页码:66 / +
页数:3
相关论文
共 50 条
  • [21] Application of synchronised phasor measurements to wide-area fault diagnosis and location
    Salehi-Dobakhshari, Ahmad
    Ranjbar, Ali Mohammad
    IET GENERATION TRANSMISSION & DISTRIBUTION, 2014, 8 (04) : 716 - 729
  • [22] A Procedure to Design Fault-Tolerant Wide-Area Damping Controllers
    Bento, Murilo E. C.
    Dotta, Daniel
    Kuiava, Roman
    Ramos, Rodrigo A.
    IEEE ACCESS, 2018, 6 : 23383 - 23405
  • [23] Application of synchronised phasor measurements to wide-area fault diagnosis and location
    Salehi-Dobakhshari, Ahmad
    Ranjbar, Ali Mohammad
    1600, Institution of Engineering and Technology, United States (08): : 716 - 729
  • [24] Wide-area fault location method considering gross measurement errors
    Hosseini, Seyed Ali
    Sadeh, Javad
    Mozafari, Babak
    IET GENERATION TRANSMISSION & DISTRIBUTION, 2017, 11 (18) : 4670 - 4679
  • [25] Wide-Area Measurement Assisted Fault Location Algorithm for Smart Grid
    Raskar, Shivaji
    Gawande, Prashant
    Dambhare, Sanjay
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON ENVIRONMENT AND ELECTRICAL ENGINEERING AND 2021 5TH IEEE INDUSTRIAL AND COMMERCIAL POWER SYSTEMS EUROPE (EEEIC/I&CPS EUROPE), 2021,
  • [26] Distribution Fault Location Using Wide-Area Voltage Magnitude Measurements
    Hossain, Shamina
    Zhu, Hao
    Overbye, Thomas
    2013 NORTH AMERICAN POWER SYMPOSIUM (NAPS), 2013,
  • [27] Implementation of a customizable fault tolerance framework
    Yen, IL
    Ahmed, I
    Jagannath, R
    Kundu, S
    FIRST INTERNATIONAL SYMPOSIUM ON OBJECT-ORIENTED REAL-TIME DISTRIBUTED COMPUTING (ISORC '98), 1998, : 230 - 239
  • [28] Fault-tolerant Wide-area Control for Power Oscillation Damping
    Sevilla, Felix Rafael Segundo
    Jaimoukha, Imad
    Chaudhuri, Balarko
    Korba, Petr
    2012 IEEE POWER AND ENERGY SOCIETY GENERAL MEETING, 2012,
  • [29] The wide world of wide-area measurement
    Phadke, A. G.
    Volskis, Hector
    de Moraes, Rui Menezes
    Bi, Tianshu
    Nayak, R. N.
    Sehgal, Y. K.
    Sen, Subir
    Sattinger, Walter
    Martinez, Enrique
    Samuelsson, Olof
    Novosel, Damir
    Madani, Vahid
    Kulikov, Yuri A.
    IEEE POWER & ENERGY MAGAZINE, 2008, 6 (05): : 52 - 65
  • [30] Achieving Reliability through Replication in a Wide-Area Network DHT Storage System
    Zhao, Jing
    Yu, Hongliang
    Zhang, Kun
    Zheng, Weimin
    Wu, Jie
    Hu, Jinfeng
    2007 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP), 2007, : 241 - +