An efficient computing-checkpoint based coordinated checkpoint algorithm

被引:0
|
作者
Men Chaoguang [1 ]
Wang Dongsheng
Zhao Yunlong
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Harbin Engn Univ, Res Ctr High Dependabil Comp Technol, Harbin 150001, Heilongjiang, Peoples R China
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, the concept of "computing checkpoint" is introduced, and then an efficient coordinated checkpoint algorithm is proposed. The algorithm combines the two approaches of reducing the overhead associated with coordinated checkpointing, which one is to minimize the processes which take checkpoints and the other is to make the checkpointing process non-blocking. Through piggybacking the information including which processes have taken new checkpoint in the broadcast committing message, the checkpoint sequence number of every process can be kept consistent in all processes, so that the unnecessary checkpoints and orphan messages can be avoided in the future running. Evaluation result shows that the number of redundant computing checkpoints is less than 1/10 of the number of tentative checkpoints. Analyses and experiments show that the overhead of our algorithm is lower than that of other coordinated checkpoint algorithms.
引用
收藏
页码:99 / 109
页数:11
相关论文
共 50 条
  • [31] On-line algorithm for checkpoint placement
    Ziv, Avi
    Bruck, Jehoshua
    IEEE Transactions on Computers, 1997, 9 : 976 - 985
  • [32] An on-line algorithm for checkpoint placement
    Ziv, A
    Bruck, J
    IEEE TRANSACTIONS ON COMPUTERS, 1997, 46 (09) : 976 - 985
  • [33] Minimum mutable checkpoint-based coordinated checkpointing protocol for mobile distributed systems
    Awasthi, Lalit K.
    Misra, Manoj
    Joshi, R. C.
    INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2014, 12 (04) : 356 - 380
  • [34] Coordinated checkpoint versus message log for fault tolerant MPI
    Bouteiller, A
    Lemarinier, P
    Krawezik, G
    Cappello, F
    IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, PROCEEDINGS, 2003, : 242 - 250
  • [35] An efficient protocol for checkpoint-based failure recovery in distributed systems
    Goswami, D
    Salm, S
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2004, 3347 : 135 - 144
  • [36] CHECKPOINT ATHERO: developing immune checkpoint-based therapeutics for atherosclerosis
    Lutgens, Esther
    Mulder, Willem J. M.
    Monaco, Claudia
    Goncalves, Isabel
    McNamera, Coleen
    Kuiper, Johan
    Noelle, Randolph
    CHECKPOINT ATHERO Consortium
    EUROPEAN HEART JOURNAL, 2023, 44 (12) : 1010 - 1012
  • [37] Algorithm-Based Checkpoint-Recovery for the Conjugate Gradient Method
    Pachajoa, Carlos
    Pacher, Christina
    Levonyak, Markus
    Gansterer, Wilfried N.
    PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [38] The spindle position checkpoint is coordinated by the Elm1 kinase
    Moore, Jeffrey K.
    Chudalayandi, Prakash
    Heil-Chapdelaine, Richard A.
    Cooper, John A.
    JOURNAL OF CELL BIOLOGY, 2010, 191 (03): : 493 - 503
  • [39] Impact of Over-Decomposition on Coordinated Checkpoint/Rollback Protocol
    Besseron, Xavier
    Gautier, Thierry
    EURO-PAR 2011: PARALLEL PROCESSING WORKSHOPS, PT II, 2012, 7156 : 322 - 332
  • [40] An efficient checkpoint scheme based on memory profile and time series analysis
    Hong, JM
    Kim, SS
    Park, JW
    Park, T
    Yeom, HY
    Cho, Y
    COMPUTERS AND THEIR APPLICATIONS, 2001, : 74 - 77