Dynamic fault tolerance in distributed simulation system

被引:0
|
作者
Ma, Min [1 ]
Jin, Shiyao [1 ]
Ye, Chaoqun [1 ]
Liu, Xiaojian [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci, Changsha 410073, Hunan, Peoples R China
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Distributed simulation system is widely used for forecasting, decision-making and scientific computing. Multi-agent and Grid have been used as platform for simulation. In order to survive from software or hardware failures and guarantee successful rate during agent migrating, system must solve the fault tolerance problem. Classic fault tolerance technology like checkpoint and redundancy can be used for distributed simulation system, but is not efficient. We present a novel fault tolerance protocol which combines the causal message logging method and prime-backup technology. The proposed protocol uses iterative backup location scheme and adaptive update interval to reduce overhead and balance the cost of fault tolerance and recovery time. The protocol has characteristics of no orphan state, and do not need the survival agents to roll-back. Most important is that the recovery scheme can tolerant concurrently failures, even the permanent failure of single node. Correctness of the protocol is proved and experiments show the protocol is efficient.
引用
收藏
页码:769 / 776
页数:8
相关论文
共 50 条
  • [41] Distributed Monte Carlo simulation of a dynamic expansion system
    Niewinski, M
    VACUUM, 2004, 73 (02) : 257 - 261
  • [42] Fault tolerant distributed simulation
    Lin, Yi-Bing
    Journal of Information Science and Engineering, 1994, 10 (02) : 259 - 269
  • [43] MobileRE: A replicas prioritized hybrid fault tolerance strategy for mobile distributed system
    Wu, Yu
    Liu, Duo
    Chen, Xianzhang
    Ren, Jinting
    Liu, Renping
    Tan, Yujuan
    Zhang, Ziling
    JOURNAL OF SYSTEMS ARCHITECTURE, 2021, 118
  • [44] Analysis of fault tolerance and reliability in distributed real-time system architectures
    Philippi, S
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2003, 82 (02) : 195 - 206
  • [45] Fault-Tolerance and Load-Balance Tradeoff in a Distributed Storage System
    Quezada Naquid, Moises
    Marcelin Jimenez, Ricardo
    Lopez Guerrero, Miguel
    COMPUTACION Y SISTEMAS, 2010, 14 (02): : 151 - 163
  • [46] Distributed control simulation and fault diagnosis expert system of electrostatic precipitator
    Hu, MY
    Hu, ZG
    Yang, GP
    Liu, BW
    Wang, LD
    COAL COMBUSTION FACING THE 21ST CENTURY, 2003, : 543 - 546
  • [47] Fault simulation to validate fault-tolerance in Ada
    Napier, J
    Chen, LP
    May, J
    Hughes, G
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2000, 15 (01): : 61 - 67
  • [48] Distributed fault tolerance in optimal interpolative nets
    Simon, D
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2001, 12 (06): : 1348 - 1357
  • [49] Fault tolerance in distributed industrial control systems
    Campelo, JC
    Rubio, A
    Rodríguez, F
    Serrano, JJ
    PROCEEDINGS OF THE COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS MODELING AND SIMULATION (CNDS'98), 1998, : 87 - 92
  • [50] Fault-tolerance in distributed query processing
    Smith, J
    Watson, P
    9TH INTERNATIONAL DATABASE ENGINEERING & APPLICATION SYMPOSIUM, PROCEEDINGS, 2005, : 329 - 338