Software-based replication for fault tolerance

被引:143
|
作者
Guerraoui, R
Schiper, A
机构
[1] Federal Institute of Technology, Lausanne
[2] Department of Computer Science, EPEL, Operating Systems Laboratory
[3] Département d'Informatique, Ecl. Polytech. Federale de Lausanne
关键词
D O I
10.1109/2.585156
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Developers of early distributed systems took a simplistic approach to providing fault tolerance: They just used another copy of the same hardware as a backup. Later, others developed replication software to work on off-the-shelf hardware. Since neither of these methods is especially economical, a logical course is to take it one step further and eliminate the extra hardware altogether. Fully software-based replication relies on sophisticated techniques to keep track of server communications and ensure the consistency of information across several server replicas. How do yu know that each server shares the same view of the data or program semantics? What happens if a server replica crashes? How do you make sure that a system processes invocations in the correct order! These are all problems that a replication technique has to handle. The authors describe two fundamental techniques, primary backup and active replication, and illustrate how they handle these problems. At this point, both have advantages and disadvantages that depend on the application. The authors also propose that group communication provides a sufficient framework for implementing software-based replication. The concept of static and dynamic groups proves useful in thinking about how to implement replication techniques. Replication techniques can also use total-order and view-synchronous multicast primitives from group communication.
引用
收藏
页码:68 / +
相关论文
共 50 条
  • [21] Software-based, low-cost fault detection for microprocessors
    Saha, Goutam Kumar
    IEEE Potentials, 2008, 27 (01): : 37 - 41
  • [22] Improving the Efficiency of Software-Based Fault Protection Mechanisms With HUSTLE
    Ferrante, Nicola
    Fanucci, Luca
    Rossi, Francesco
    Terrosi, Francesco
    Bondavalli, Andrea
    IEEE ACCESS, 2024, 12 : 104728 - 104741
  • [23] Software-Based Fault Detection for Multicircuit Building Lighting Systems
    Bursill, Jayson
    O'Brien, William
    Beausoleil-Morrison, Ian
    ASHRAE TRANSACTIONS, VOL 124, PT 1, 2018, 124 : 159 - 170
  • [24] A Software-based Fault Detection Scheme for Wireless Sensor Networks
    Chang, Hsung-Pin
    Yeh, Tsung-Yu
    2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 7 - 12
  • [25] FERRARI - A FLEXIBLE SOFTWARE-BASED FAULT AND ERROR INJECTION SYSTEM
    KANAWATI, GA
    KANAWATI, NA
    ABRAHAM, JA
    IEEE TRANSACTIONS ON COMPUTERS, 1995, 44 (02) : 248 - 260
  • [26] A Multi-Layer Software-Based Fault-Tolerance Approach for Heterogenous Multi-Core Systems
    Mueller, S.
    Koal, T.
    Scharoba, S.
    Vierhaus, H. T.
    Schoelzel, M.
    2015 16TH LATIN-AMERICAN TEST SYMPOSIUM (LATS), 2015,
  • [27] Replication Based Fault Tolerance Approach for Cloud
    Agarwal, Kamal K.
    Kotakula, Haribabu
    DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2022, 2022, 13145 : 163 - 169
  • [28] Software-Based Resolver-to-Digital Conversion and Online Fault Compensation
    Guo, Chuangqiang
    Wu, Chunya
    Ni, Fenglei
    Liu, Hong
    2016 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, 2016, : 344 - 349
  • [29] Plundervolt: Software-based Fault Injection Attacks against Intel SGX
    Murdock, Kit
    Oswald, David
    Garcia, Flavio D.
    Van Bulck, Jo
    Gruss, Daniel
    Piessens, Frank
    2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2020), 2020, : 1466 - 1482
  • [30] Software-based Pauli Tracking in Fault-tolerant Quantum Circuits
    Paler, Alexandru
    Devitt, Simon
    Nemoto, Kae
    Polian, Ilia
    2014 DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION (DATE), 2014,