FNB: Fast Non-Blocking Coordinated Checkpointing Protocol for Distributed Systems

被引:4
|
作者
Abdelhafidi, Zohra [1 ]
Djoudi, Mohamed [1 ]
Lagraa, Nasreddine [1 ]
Yagoubi, Mohamed Bachir [1 ]
机构
[1] Amar Telidji Univ, Comp Sci & Math Lab, Laghouat 03000, Algeria
关键词
Distributed systems; Fault tolerance; Coordinated checkpointing; Dependency; Popular process; GLOBAL-SNAPSHOT ALGORITHMS; LARGE-SCALE; ROLLBACK-RECOVERY; MODEL; LOGP;
D O I
10.1007/s00224-014-9599-8
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents a Fast Non-Blocking coordinated checkpointing protocol for distributed systems with the aim of minimizing the number of requests and mutable checkpoints while reducing the checkpointing latency. Our protocol relies on two mechanisms; the first one is piggybacking dependency information on computation and reply message, thereby, tracking direct, transitive and hidden dependencies among processes. The second one is popular processes; due to the communication between processes, it is more desirable that the checkpointing procedure is initiated by popular processes having more dependency information. In fact, this way may reduce the checkpointing latency and the likelihood of checkpointing halting caused by fault occurrence. We also present a simulation study that compares our protocol to CSNB protocol (Cao and Singhal Non-Blocking) and CSB.protocol (Cao and Singhal Blocking).
引用
收藏
页码:397 / 425
页数:29
相关论文
共 50 条
  • [21] Non-blocking distributed transaction processing system
    Kommareddy, M
    Wong, J
    JOURNAL OF SYSTEMS AND SOFTWARE, 2000, 54 (01) : 65 - 76
  • [22] A Distributed Counter-based Non-blocking Coordinated Checkpoint Algorithm for Grid Computing Applications
    El-Sayed, Gamal A.
    Hossny, Khadra A.
    2012 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTATIONAL TOOLS FOR ENGINEERING APPLICATIONS (ACTEA), 2012, : 80 - 85
  • [23] Non-blocking message total ordering protocol
    WANG Yun & WANG JunLing School of Computer Science & Engineering
    Key Lab of Computer Network & Information Integration
    ScienceinChina(SeriesF:InformationSciences), 2008, (12) : 1919 - 1934
  • [24] A new non-blocking counter-based coordinated checkpointing algorithm as a migration tool in a high performance dynamic Grid scheduler
    El-Sayed, GA
    Greensheids, IR
    PDPTA '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-3, 2004, : 217 - 223
  • [25] A hybrid coordinated checkpointing protocol for mobile computing systems
    Kumar, Parveen
    Kumar, Lalit
    Chauhan, R. K.
    IETE JOURNAL OF RESEARCH, 2006, 52 (2-3) : 247 - 254
  • [26] A case study of agreement problems in distributed systems: Non-blocking atomic commitment
    Raynal, M
    1997 HIGH-ASSURANCE ENGINEERING WORKSHOP - PROCEEDINGS, 1997, : 209 - 214
  • [27] AN EFFICIENT PROTOCOL FOR CHECKPOINTING RECOVERY IN DISTRIBUTED SYSTEMS
    KIM, JL
    PARK, T
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1993, 4 (08) : 955 - 960
  • [28] A weighted checkpointing protocol for mobile distributed systems
    Awasthi, Lalit K.
    Misra, Manoj
    Joshi, R. C.
    INTERNATIONAL JOURNAL OF AD HOC AND UBIQUITOUS COMPUTING, 2010, 5 (03) : 137 - 149
  • [29] A Non-Blocking Online Cake-Cutting Protocol
    Kubo, Koki
    Manabe, Yoshifumi
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MATHEMATICS AND COMPUTERS IN SCIENCES AND IN INDUSTRY (MCSI 2016), 2016, : 258 - 263
  • [30] Non-blocking PMI Extensions for Fast MPI Startup
    Chakraborty, S.
    Subramoni, H.
    Moody, A.
    Venkatesh, A.
    Perkins, J.
    Panda, D. K.
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 131 - 140