A Distributed Counter-based Non-blocking Coordinated Checkpoint Algorithm for Grid Computing Applications

被引:0
|
作者
El-Sayed, Gamal A. [1 ]
Hossny, Khadra A. [1 ]
机构
[1] Assiut Univ, Dept Elect Engn, Assiut, Egypt
关键词
fault-tolerance; distributed systems; Coordinated checkpointing; consistent state;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In distributed systems, there are many opportunities for failure. Any component in any compute node could fail. This includes, but is not limited to, the processor, disk, memory, or network interface on the node. Any of these failures will cause the processes running on the affected nodes to crash or produce incorrect results. The common method of ensuring the progress of these processes is to take a checkpoint, this issue is complicated if the processes are inter-communication processes. This paper presents a distributed non-blocking coordinated checkpointing algorithm that ensures producing global consistent checkpoints images. These consistent checkpoint images can be used to migrate application processes to different computing nodes when a failure takes place.
引用
收藏
页码:80 / 85
页数:6
相关论文
共 26 条
  • [1] A new non-blocking counter-based coordinated checkpointing algorithm as a migration tool in a high performance dynamic Grid scheduler
    El-Sayed, GA
    Greensheids, IR
    PDPTA '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-3, 2004, : 217 - 223
  • [2] A non-blocking Checkpointing algorithm for distributed systems
    Guoliang L.
    Shuyu C.
    Xiaoqin Z.
    International Journal of Digital Content Technology and its Applications, 2011, 5 (07) : 230 - 238
  • [3] Non-blocking coordinated checkpointing protocol for distributed simulation system
    Liu, Yun-Sheng
    Huang, Jian
    Zha, Ya-Bing
    Xitong Fangzhen Xuebao / Journal of System Simulation, 2007, 19 (01): : 71 - 74
  • [4] FNB: Fast Non-Blocking Coordinated Checkpointing Protocol for Distributed Systems
    Zohra Abdelhafidi
    Mohamed Djoudi
    Nasreddine Lagraa
    Mohamed Bachir Yagoubi
    Theory of Computing Systems, 2015, 57 : 397 - 425
  • [5] FNB: Fast Non-Blocking Coordinated Checkpointing Protocol for Distributed Systems
    Abdelhafidi, Zohra
    Djoudi, Mohamed
    Lagraa, Nasreddine
    Yagoubi, Mohamed Bachir
    THEORY OF COMPUTING SYSTEMS, 2015, 57 (02) : 397 - 425
  • [6] An efficient computing-checkpoint based coordinated checkpoint algorithm
    Men Chaoguang
    Wang Dongsheng
    Zhao Yunlong
    EMBEDDED AND UBIQUITOUS COMPUTING, PROCEEDINGS, 2006, 4096 : 99 - 109
  • [7] Using computing checkpoints implement consistent low-cost non-blocking coordinated checkpointing
    Men, C
    Yang, XZ
    PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 : 570 - 576
  • [8] Non-Blocking Atomic Commitment Algorithm in Asynchronous Distributed Systems with Unreliable Failure Detectors
    Park, Sung-Hoon
    Lee, Jea-Yep
    Yu, Su-Chang
    PROCEEDINGS OF THE 2013 10TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2013, : 33 - 38
  • [9] Non-blocking atomic commitment in distributed systems: A tutorial based on a generic protocol
    Raynal, M
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2000, 15 (02): : 77 - 86
  • [10] Software Model Checking for Distributed Systems with Selector-Based, Non-blocking Communication
    Artho, Cyrille
    Hagiya, Masami
    Potter, Richard
    Tanabe, Yoshinori
    Weitl, Franz
    Yamamoto, Mitsuharu
    2013 28TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2013, : 169 - 179