Dynamic fault tolerance in distributed simulation system

被引:0
|
作者
Ma, Min [1 ]
Jin, Shiyao [1 ]
Ye, Chaoqun [1 ]
Liu, Xiaojian [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci, Changsha 410073, Hunan, Peoples R China
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Distributed simulation system is widely used for forecasting, decision-making and scientific computing. Multi-agent and Grid have been used as platform for simulation. In order to survive from software or hardware failures and guarantee successful rate during agent migrating, system must solve the fault tolerance problem. Classic fault tolerance technology like checkpoint and redundancy can be used for distributed simulation system, but is not efficient. We present a novel fault tolerance protocol which combines the causal message logging method and prime-backup technology. The proposed protocol uses iterative backup location scheme and adaptive update interval to reduce overhead and balance the cost of fault tolerance and recovery time. The protocol has characteristics of no orphan state, and do not need the survival agents to roll-back. Most important is that the recovery scheme can tolerant concurrently failures, even the permanent failure of single node. Correctness of the protocol is proved and experiments show the protocol is efficient.
引用
收藏
页码:769 / 776
页数:8
相关论文
共 50 条
  • [31] THE MAFT ARCHITECTURE FOR DISTRIBUTED FAULT TOLERANCE
    KIECKHAFER, RM
    WALTER, CJ
    FINN, AM
    THAMBIDURAI, PM
    IEEE TRANSACTIONS ON COMPUTERS, 1988, 37 (04) : 398 - 405
  • [32] Fault Tolerance in Heterogeneous Distributed Systems
    Wang, Zhe
    Minsky, Naftaly H.
    2014 INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM), 2014, : 539 - 545
  • [33] SYNCHRONIZATION AND FAULT TOLERANCE IN A DISTRIBUTED TRACKER
    LEIGHTON, DA
    HANSEN, BK
    SIGNAL AND DATA PROCESSING OF SMALL TARGETS 1989, 1989, 1096 : 224 - 230
  • [34] An architecture for rapid distributed fault tolerance
    Russ, SH
    PARALLEL AND DISTRIBUTED PROCESSING, 1998, 1388 : 925 - 930
  • [35] Fault Tolerance in Distributed Systems: A Survey
    Ledmi, Abdeldjalil
    Bendjenna, Hakim
    Hemam, Sofiane Mounine
    2018 3RD INTERNATIONAL CONFERENCE ON PATTERN ANALYSIS AND INTELLIGENT SYSTEMS (PAIS), 2018, : 235 - 239
  • [36] Fault Tolerance in Distributed Mechanism Design
    Gradwohl, Ronen
    INTERNET AND NETWORK ECONOMICS, PROCEEDINGS, 2008, 5385 : 539 - 547
  • [37] LAN DISTRIBUTED FAULT-TOLERANCE
    MIROJULIA, J
    DECENTRALIZED AND DISTRIBUTED SYSTEMS, 1993, 39 : 161 - 174
  • [38] On verifying fault tolerance of distributed protocols
    Fisman, Dana
    Kupferman, Orna
    Lustig, Yoad
    TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, 2008, 4963 : 315 - 331
  • [39] Distributed MapReduce Engine with Fault Tolerance
    Song, Lixing
    Wu, Shaoen
    Wang, Honggang
    Yang, Qing
    2014 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2014, : 3626 - 3630
  • [40] DYNAMIC SYSTEM SIMULATION USING DISTRIBUTED COMPUTATION HARDWARE
    Williams, Keith A.
    PROCEEDINGS OF THE ASME CONFERENCE ON SMART MATERIALS, ADAPTIVE STRUCTURES AND INTELLIGENT SYSTEMS, 2016, VOL 2, 2016,