Fault Tolerance in Heterogeneous Distributed Systems

被引:1
|
作者
Wang, Zhe [1 ]
Minsky, Naftaly H. [1 ]
机构
[1] Rutgers State Univ, Dept Comp Sci, New Brunswick, NJ 08901 USA
关键词
MODEL; COORDINATION;
D O I
10.4108/icst.collaboratecom.2014.257585
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Dependability of heterogeneous distributed systems is an important issue. Coordination failures may occur even if the given coordination protocol is adhered to by all participants. The fault tolerance (FT) properties of systems are difficult to achieve, especially at application level. What is common to current FT-techniques is their reliance on the code of the various system components, which are often required to be written in a specific language. From the viewpoint of distributed systems, such techniques are feasible for homogeneous systems, or at least systems that are designed and maintained by a single administrative domain. But such code-based techniques are generally unreliable for open systems, due to the lack of overall control over the code of components. This leaves open distributed systems vulnerable to their own faults and to attack on them. However, certain types of FT measures can be established in distributed systems by controlling the flow of messages between system components, independently of the code of system components-which we plan to do via a distributed coordination and control mechanism called Law-Governed Interaction. We demonstrate in this paper, there is a substantial range of FT measures that can be established completely by controlling messaging. Moreover, although the FT-measures to be developed are meant mostly for open systems, some of them can be useful for distributed systems in general, even where traditional code-based techniques are feasible.
引用
收藏
页码:539 / 545
页数:7
相关论文
共 50 条
  • [41] Integrating fault tolerance and load balancing in distributed systems based on CORBA
    Singh, AV
    Moser, LE
    Melliar-Smith, PM
    [J]. DEPENDABLE COMPUTING - EDCC-5, PROCEEDINGS, 2005, 3463 : 154 - 166
  • [42] An Efficient Fault Tolerance Framework for Distributed In-memory Caching Systems
    Zhao, Shuaibing
    Shen, Lu
    Li, Yusen
    Stones, Rebecca J.
    Wang, Gang
    Liu, Xiaoguang
    [J]. 2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 553 - 560
  • [43] Fault-Tolerance Implementation in Typical Distributed Stream Processing Systems
    Chen, Wuhong
    Tsai, Jichiang
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2014, 30 (04) : 1167 - 1186
  • [44] GROUP-TO-GROUP COMMUNICATIONS FOR FAULT-TOLERANCE IN DISTRIBUTED SYSTEMS
    HIGAKI, H
    SONEOKA, T
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1993, E76D (11) : 1348 - 1357
  • [45] Self-Healing Dilemmas in Distributed Systems: Fault Correction vs. Fault Tolerance
    Nikolic, Jovan
    Jubatyrov, Nursultan
    Pournaras, Evangelos
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2021, 18 (03): : 2728 - 2741
  • [46] Task scheduling with fault-tolerance in real-time heterogeneous systems
    Liu, Jing
    Wei, Mengxue
    Hu, Wei
    Xu, Xin
    Ouyang, Aijia
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2018, 90 : 23 - 33
  • [47] Incorporating fault tolerance in distributed applications
    Ouyang, J
    Maheshwari, P
    [J]. PROCEEDINGS OF THE 21ST AUSTRALASIAN COMPUTER SCIENCE CONFERENCE, ACSC'98, 1998, 20 (01): : 121 - 132
  • [48] THE MAFT ARCHITECTURE FOR DISTRIBUTED FAULT TOLERANCE
    KIECKHAFER, RM
    WALTER, CJ
    FINN, AM
    THAMBIDURAI, PM
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1988, 37 (04) : 398 - 405
  • [49] An architecture for rapid distributed fault tolerance
    Russ, SH
    [J]. PARALLEL AND DISTRIBUTED PROCESSING, 1998, 1388 : 925 - 930
  • [50] SYNCHRONIZATION AND FAULT TOLERANCE IN A DISTRIBUTED TRACKER
    LEIGHTON, DA
    HANSEN, BK
    [J]. SIGNAL AND DATA PROCESSING OF SMALL TARGETS 1989, 1989, 1096 : 224 - 230