RESPONDING TO CATASTROPHIC ERRORS - A DESIGN TECHNIQUE FOR FAULT-TOLERANT SOFTWARE

被引:0
|
作者
DAVIS, FGF [1 ]
GANTENBEIN, RE [1 ]
机构
[1] UNIV WYOMING,DEPT COMP SCI,OPERATING SYST LAB,LARAMIE,WY 82071
关键词
D O I
10.1016/0164-1212(92)90113-X
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The usual classification of software-caused system errors as internal, external, or pervasive assumes a rippling propagation of errors through a hierarchy of structures. As a result, most fault-tolerant software handles errors through nested detection and recovery mechanisms. In many cases, particularly in distributed systems, this assumption may not hold; catastrophic errors may occur that can evade the boundaries of the usual mechanisms and cause large-scale system failure. System designers must consider the possibility of failure from the first stages of system development, define the circumstances under which these failures might occur, and analyze the costs of dealing with such failures. Fault-tolerance techniques can be applied to reduce the effect of catastrophic errors. One such technique, dynamic reconfiguration, is described here as an example of a practical way for a system to respond to a detected error. Dynamic reconfiguration can be used not only to recover from software errors but also to remove the faults that caused the errors. An example of the design of a life-critical software system using dynamic configuration to handle potentially catastrophic errors is presented.
引用
收藏
页码:243 / 251
页数:9
相关论文
共 50 条
  • [21] Using Petri nets for the design of conversation boundaries in fault-tolerant software
    Wu, Jie, 1600, IEEE, Piscataway, NJ, United States (05):
  • [22] A Fault-tolerant Sequential Circuit Design for Soft Errors Based on Fault-Secure Circuit
    Ostanin, S.
    Matrosova, A.
    Butorina, N.
    Lavrov, V.
    PROCEEDINGS OF 2016 IEEE EAST-WEST DESIGN & TEST SYMPOSIUM (EWDTS), 2016,
  • [23] EVALUATION AND COMPARISON OF FAULT-TOLERANT SOFTWARE TECHNIQUES
    HUDAK, J
    SUH, BH
    SIEWIOREK, D
    SEGALL, Z
    IEEE TRANSACTIONS ON RELIABILITY, 1993, 42 (02) : 190 - 204
  • [24] HARDWARE AND SOFTWARE FOR FAULT-TOLERANT COMPUTING SYSTEMS
    SOGOMONYAN, ES
    SHAGAEV, IV
    AUTOMATION AND REMOTE CONTROL, 1988, 49 (02) : 129 - 151
  • [25] Aspects for improvement of performance in fault-tolerant software
    Szentiványi, D
    Nadjm-Tehrani, S
    10TH IEEE PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2004, : 283 - 291
  • [26] Achieving fault-tolerant software with rejuvenation and reconfiguration
    Yurcik, W
    Doss, D
    IEEE SOFTWARE, 2001, 18 (04) : 48 - +
  • [27] RELIABILITY-GROWTH OF FAULT-TOLERANT SOFTWARE
    KANOUN, K
    KAANICHE, M
    BEOUNES, C
    LAPRIE, JC
    ARLAT, J
    IEEE TRANSACTIONS ON RELIABILITY, 1993, 42 (02) : 205 - 219
  • [28] Reliability simulation of fault-tolerant software and systems
    Gokhale, SS
    Lyu, MR
    Trivedi, KS
    PACIFIC RIM INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT SYSTEMS, PROCEEDINGS, 1997, : 167 - 173
  • [29] The effect of testing on reliability of fault-tolerant software
    Popov, P
    Littlewood, B
    2004 INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2004, : 265 - 274
  • [30] Optimal structure of fault-tolerant software systems
    Levitin, G
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2005, 89 (03) : 286 - 295