Error detection and diagnosis for fault tolerance in distributed systems

被引:5
|
作者
Saleh, K [1 ]
Al-Saqabi, K [1 ]
机构
[1] Kuwait Univ, Dept Elect & Comp Engn, Safat 13060, Kuwait
关键词
communications software; detection diagnosis; distributed systems; fault tolerance;
D O I
10.1016/S0950-5849(97)00058-X
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The early error detection and the understanding of the nature and conditions of an error occurrence can be useful to make an effective and efficient recovery in distributed systems. Various distributed system extensions were introduced for the implementation of fault tolerance in distributed software systems. These extensions rely mainly on the exchange of contextual information appended to every transmitted application specific message. Ideally, this information should be used for checkpointing, error detection, diagnosis and recovery should a transient failure occur later during the distributed program execution. In this paper, we present a generalized extension suitable for fault-tolerant distributed systems such as communication software systems and its detection capabilities are shown. Our extension is based on the execution of message validity test prior to the transmission of messages and the piggybacking of contextual information to facilitate the detection and diagnosis of transient faults in the distributed system. (C) 1998 Elsevier Science B.V.
引用
收藏
页码:975 / 983
页数:9
相关论文
共 50 条
  • [1] FAULT TOLERANCE IN DISTRIBUTED SYSTEMS
    SCHMITTER, E
    SIEMENS FORSCHUNGS-UND ENTWICKLUNGSBERICHTE-SIEMENS RESEARCH AND DEVELOPMENT REPORTS, 1983, 12 (01): : 34 - 37
  • [2] Concurrent error detection, diagnosis, and fault tolerance for switched-capacitor filters
    Natl Cheng Kung Univ, Tainan, Taiwan
    J Inf Sci Eng, 4 (863-890):
  • [3] Concurrent error detection, diagnosis, and fault tolerance for switched-capacitor filters
    Lee, KJ
    Kuo, CH
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 1998, 14 (04) : 863 - 890
  • [4] Fault Tolerance in Heterogeneous Distributed Systems
    Wang, Zhe
    Minsky, Naftaly H.
    2014 INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM), 2014, : 539 - 545
  • [5] Fault Tolerance in Distributed Systems: A Survey
    Ledmi, Abdeldjalil
    Bendjenna, Hakim
    Hemam, Sofiane Mounine
    2018 3RD INTERNATIONAL CONFERENCE ON PATTERN ANALYSIS AND INTELLIGENT SYSTEMS (PAIS), 2018, : 235 - 239
  • [6] Distributed chronicle for the fault diagnosis in distributed systems
    Aguilar, Jose
    Vizcarrondo, Juan
    INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2020, 24 (03) : 284 - 315
  • [7] Fault tolerance in distributed industrial control systems
    Campelo, JC
    Rubio, A
    Rodríguez, F
    Serrano, JJ
    PROCEEDINGS OF THE COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS MODELING AND SIMULATION (CNDS'98), 1998, : 87 - 92
  • [8] Communication fault tolerance in distributed robotic systems
    Molnár, P
    Starke, J
    DISTRIBUTED AUTONOMOUS ROBOTIC SYSTEMS, 2000, : 99 - 108
  • [9] Optimizing fault tolerance in embedded distributed systems
    Draber, S
    IEEE MICRO, 2000, 20 (04) : 76 - 84
  • [10] Dynamic fault tolerance in distributed vehicle systems
    Torlo, M
    Bertram, T
    ELECTRONIC SYSTEMS FOR VEHICLES, 2001, 1646 : 99 - 122