Error detection and diagnosis for fault tolerance in distributed systems

被引:5
|
作者
Saleh, K [1 ]
Al-Saqabi, K [1 ]
机构
[1] Kuwait Univ, Dept Elect & Comp Engn, Safat 13060, Kuwait
关键词
communications software; detection diagnosis; distributed systems; fault tolerance;
D O I
10.1016/S0950-5849(97)00058-X
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The early error detection and the understanding of the nature and conditions of an error occurrence can be useful to make an effective and efficient recovery in distributed systems. Various distributed system extensions were introduced for the implementation of fault tolerance in distributed software systems. These extensions rely mainly on the exchange of contextual information appended to every transmitted application specific message. Ideally, this information should be used for checkpointing, error detection, diagnosis and recovery should a transient failure occur later during the distributed program execution. In this paper, we present a generalized extension suitable for fault-tolerant distributed systems such as communication software systems and its detection capabilities are shown. Our extension is based on the execution of message validity test prior to the transmission of messages and the piggybacking of contextual information to facilitate the detection and diagnosis of transient faults in the distributed system. (C) 1998 Elsevier Science B.V.
引用
收藏
页码:975 / 983
页数:9
相关论文
共 50 条
  • [31] Error Detection and Fault Tolerance in ECSM Using Input Randomization
    Dominguez-Oviedo, Agustin
    Hasan, M. Anwar
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2009, 6 (03) : 175 - 187
  • [32] Robust Distributed Sensor Fault Detection and Diagnosis Within Formation Control of Multiagent Systems
    Zhong, Yujiang
    Zhang, Youmin
    Ge, Shuzhi Sam
    He, Xiao
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2023, 59 (02) : 1340 - 1353
  • [33] Fault Detection and Diagnosis of Distributed Parameter Systems Based on Sensor Networks and Artificial Intelligence
    Volosencu, Constantin
    ISPRA '09: PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, ROBOTICS AND AUTOMATION, 2010, : 200 - +
  • [34] On Fault Detection and Diagnosis in Robotic Systems
    Khalastchi, Eliahu
    Kalech, Meir
    ACM COMPUTING SURVEYS, 2018, 51 (01)
  • [35] Fault detection and diagnosis of technical systems
    Fagarasan, Ioana
    St Iliescu, S.
    PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON AUTOMATION AND INFORMATION, 2008, : 446 - 453
  • [36] A model-based fault detection and diagnosis scheme for distributed parameter systems: A learning systems approach
    Demetriou, MA
    ESAIM-CONTROL OPTIMISATION AND CALCULUS OF VARIATIONS, 2002, 7 (03) : 43 - 67
  • [37] Implicit Intermittent Fault Detection in Distributed Systems
    Waszecki, Peter
    Kauer, Matthias
    Lukasiewycz, Martin
    Chakraborty, Samarjit
    2014 19TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2014, : 646 - 651
  • [38] ERROR TOLERANCE IN DISTRIBUTED SYSTEMS - THE DELTA-4 APPROACH
    POWELL, D
    MARTIN, P
    SEATON, D
    TSI-TECHNIQUE ET SCIENCE INFORMATIQUES, 1987, 6 (02): : 197 - 200
  • [39] Fault tolerance in distributed systems using deep learning approaches
    Assiri, Basem
    Sheneamer, Abdullah
    PLOS ONE, 2025, 20 (01):
  • [40] Fault-tolerance in distributed real-time systems
    Jahanian, F
    THIRD INTERNATIONAL WORKSHOP ON REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS, PROCEEDINGS, 1996, : 178 - 178