Dynamic fault tolerance in distributed simulation system

被引:0
|
作者
Ma, Min [1 ]
Jin, Shiyao [1 ]
Ye, Chaoqun [1 ]
Liu, Xiaojian [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci, Changsha 410073, Hunan, Peoples R China
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Distributed simulation system is widely used for forecasting, decision-making and scientific computing. Multi-agent and Grid have been used as platform for simulation. In order to survive from software or hardware failures and guarantee successful rate during agent migrating, system must solve the fault tolerance problem. Classic fault tolerance technology like checkpoint and redundancy can be used for distributed simulation system, but is not efficient. We present a novel fault tolerance protocol which combines the causal message logging method and prime-backup technology. The proposed protocol uses iterative backup location scheme and adaptive update interval to reduce overhead and balance the cost of fault tolerance and recovery time. The protocol has characteristics of no orphan state, and do not need the survival agents to roll-back. Most important is that the recovery scheme can tolerant concurrently failures, even the permanent failure of single node. Correctness of the protocol is proved and experiments show the protocol is efficient.
引用
收藏
页码:769 / 776
页数:8
相关论文
共 50 条
  • [1] Dynamic fault tolerance in distributed vehicle systems
    Torlo, M
    Bertram, T
    ELECTRONIC SYSTEMS FOR VEHICLES, 2001, 1646 : 99 - 122
  • [2] Fault Tolerance Model for Hadoop Distributed System
    Ahmed, Soraya Setti
    Slimani, Yahya
    Frefita, Riadh
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2025, 31 (01) : 72 - 92
  • [3] Fault tolerance for distributed process control system
    Takizawa, H
    SICE 2002: PROCEEDINGS OF THE 41ST SICE ANNUAL CONFERENCE, VOLS 1-5, 2002, : 3259 - 3263
  • [4] Fault tolerance in a distributed CHORUS/MiX system
    Kittur, S
    Steel, D
    Armand, F
    Lipkis, J
    PROCEEDINGS OF THE USENIX 1996 ANNUAL TECHNICAL CONFERENCE, 1996, : 219 - 228
  • [5] Design and Realization of a Fault-Tolerance Model to Distributed Simulation System of Hydropower Plant
    Zhang, Binqiao
    Wu, Chengming
    Li, Xianshan
    Wang, Pengyu
    Liu, Rongzhang
    2012 WORLD AUTOMATION CONGRESS (WAC), 2012,
  • [6] Symmetric distributed computing with dynamic load balancing and fault tolerance
    Bubeck, T
    Kuchlin, W
    Rosenstiel, W
    LANGUAGES, COMPILERS AND RUN-TIME SYSTEMS FOR SCALABLE COMPUTERS, 1996, : 325 - 328
  • [7] A Fault Tolerance Mechanism in Distributed and Complex System on a LAN
    Lassoued, Farid
    Bouallegue, Ridha
    PROCEEDINGS OF THE MEDITERRANEAN CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGIES 2015 (MEDCT 2015), VOL 2, 2016, 381 : 563 - 568
  • [8] A Distributed Fault Tolerance Mechanism for an IoT Healthcare system
    Zaiter, Meriem
    Hacini, Salima
    Moussa, Guedrez
    2020 21ST INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2020,
  • [9] Fault Tolerance Mechanism of a Distributed System for Marine Communication Network
    Sun, Jingwei
    JOURNAL OF COASTAL RESEARCH, 2020, : 605 - 608
  • [10] Fault-tolerance in the borealis distributed stream processing system
    Balazinska, Magdalena
    Balakrishnan, Hari
    Madden, Samuel R.
    Stonebraker, Michael
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2008, 33 (01):