FNB: Fast Non-Blocking Coordinated Checkpointing Protocol for Distributed Systems

被引:4
|
作者
Abdelhafidi, Zohra [1 ]
Djoudi, Mohamed [1 ]
Lagraa, Nasreddine [1 ]
Yagoubi, Mohamed Bachir [1 ]
机构
[1] Amar Telidji Univ, Comp Sci & Math Lab, Laghouat 03000, Algeria
关键词
Distributed systems; Fault tolerance; Coordinated checkpointing; Dependency; Popular process; GLOBAL-SNAPSHOT ALGORITHMS; LARGE-SCALE; ROLLBACK-RECOVERY; MODEL; LOGP;
D O I
10.1007/s00224-014-9599-8
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents a Fast Non-Blocking coordinated checkpointing protocol for distributed systems with the aim of minimizing the number of requests and mutable checkpoints while reducing the checkpointing latency. Our protocol relies on two mechanisms; the first one is piggybacking dependency information on computation and reply message, thereby, tracking direct, transitive and hidden dependencies among processes. The second one is popular processes; due to the communication between processes, it is more desirable that the checkpointing procedure is initiated by popular processes having more dependency information. In fact, this way may reduce the checkpointing latency and the likelihood of checkpointing halting caused by fault occurrence. We also present a simulation study that compares our protocol to CSNB protocol (Cao and Singhal Non-Blocking) and CSB.protocol (Cao and Singhal Blocking).
引用
收藏
页码:397 / 425
页数:29
相关论文
共 50 条
  • [41] A minimum-process coordinated checkpointing protocol for mobile computing systems
    Gupta, Sunil Kumar
    Chauhan, R. K.
    Kumar, Parveen
    INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2008, 19 (04) : 1015 - 1038
  • [42] Non-blocking PMD monitoring in live optical systems
    Hui, R.
    Saunders, R.
    Heffner, B.
    Richards, D.
    Fu, B.
    Adany, P.
    ELECTRONICS LETTERS, 2007, 43 (01) : 53 - 54
  • [43] A causal message logging protocol with asynchronous checkpointing for distributed systems
    Ahn, J
    Kim, K
    Hwang, C
    PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, 2000, : 523 - 528
  • [44] Blocking and non-blocking concurrent hash tables in multi-core systems
    1600, World Scientific and Engineering Academy and Society, Ag. Ioannou Theologou 17-23, Zographou, Athens, 15773, Greece (12):
  • [45] Fast non-blocking atomic commit: an inherent trade-off
    Dutta, P
    Guerraoui, R
    Pochon, B
    INFORMATION PROCESSING LETTERS, 2004, 91 (04) : 195 - 200
  • [46] Distributed Adaptive Routing Convergence to Non-Blocking DCN Routing Assignments
    Zahavi, Eitan
    Keslassy, Isaac
    Kolodny, Avinoam
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2014, 32 (01) : 88 - 101
  • [47] Fast Non-blocking NxN Optical Switch Using Diffractive MOEMS
    Blanche, Pierre-Alexandre
    Lynn, Brittany
    Miles, Alexander
    Wissinger, John
    Norwood, Robert A.
    Peyghambarian, N.
    2015 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXHIBITION (OFC), 2015,
  • [48] Mutually non-blocking supervisory control of discrete event systems
    Fabian, M
    Kumar, R
    PROCEEDINGS OF THE 36TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 1997, : 2970 - 2975
  • [49] An efficient communication induced rollforward checkpointing and recovery protocol for distributed systems
    Gu, MM
    Zeng, L
    Liang, ZH
    Gupta, B
    COMPUTERS AND THEIR APPLICATIONS, 2000, : 298 - 302
  • [50] PATCH: A Plug-in Framework of Non-blocking Inference for Distributed Multimodal System
    Wang, Juexing
    Wang, Guangjing
    Zhang, Xiao
    Liu, Li
    Zeng, Huacheng
    Xiao, Li
    Cao, Zhichao
    Gu, Lin
    Li, Tianxing
    PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2023, 7 (03):