An efficient computing-checkpoint based coordinated checkpoint algorithm

被引:0
|
作者
Men Chaoguang [1 ]
Wang Dongsheng
Zhao Yunlong
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Harbin Engn Univ, Res Ctr High Dependabil Comp Technol, Harbin 150001, Heilongjiang, Peoples R China
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, the concept of "computing checkpoint" is introduced, and then an efficient coordinated checkpoint algorithm is proposed. The algorithm combines the two approaches of reducing the overhead associated with coordinated checkpointing, which one is to minimize the processes which take checkpoints and the other is to make the checkpointing process non-blocking. Through piggybacking the information including which processes have taken new checkpoint in the broadcast committing message, the checkpoint sequence number of every process can be kept consistent in all processes, so that the unnecessary checkpoints and orphan messages can be avoided in the future running. Evaluation result shows that the number of redundant computing checkpoints is less than 1/10 of the number of tentative checkpoints. Analyses and experiments show that the overhead of our algorithm is lower than that of other coordinated checkpoint algorithms.
引用
收藏
页码:99 / 109
页数:11
相关论文
共 50 条
  • [41] stdchk: A Checkpoint Storage System for Desktop Grid Computing
    Al-Kiswany, Samer
    Ripeanu, Matel
    Vazhkudai, Sudharshan S.
    Gharaibeh, Abdullah
    28TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, VOLS 1 AND 2, PROCEEDINGS, 2008, : 613 - +
  • [42] Performance Comparison of Hierarchical Checkpoint Protocols Grid Computing
    Ndiaye, Ndeye Massata
    Sens, Pierre
    Thiare, Ousmane
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2012, 151 : 339 - +
  • [43] Automatic checkpoint strategy for parallel computing frame with Spark
    Ying C.
    Yu J.
    Bian C.
    Lu L.
    Qian Y.
    Yu, Jiong (yujiong@xju.edu.cn), 1600, Southeast University (47): : 231 - 235
  • [44] Performance comparison of hierarchical checkpoint protocols grid computing
    Ndiaye, Ndeye Massata
    Sens, Pierre
    Thiare, Ousmane
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2012, 1 (05): : 46 - 53
  • [45] A Scalable Checkpoint Encoding Algorithm for Diskless Checkpointing
    Chen, Zizhong
    Dongarra, Jack
    11TH IEEE HIGH ASSURANCE SYSTEMS ENGINEERING SYMPOSIUM, PROCEEDINGS, 2008, : 71 - +
  • [46] A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI
    Bland, Wesley
    Du, Peng
    Bouteiller, Aurelien
    Herault, Thomas
    Bosilca, George
    Dongarra, Jack
    EURO-PAR 2012 PARALLEL PROCESSING, 2012, 7484 : 477 - 488
  • [47] Optimised Recovery with a Coordinated Checkpoint/Rollback Protocol for Domain Decomposition Applications
    Besseron, Xavier
    Guitier, Thierry
    MODELLING, COMPUTATION AND OPTIMIZATION IN INFORMATION SYSTEMS AND MANAGEMENT SCIENCES, PROCEEDINGS, 2008, 14 : 497 - 506
  • [48] Dampening Checkpoint Signaling via Coordinated BRCT-domain Interactions
    Cussiol, Jose
    Jablonowski, Carolyn
    Yimit, Askar
    Brown, Grant
    Smolka, Marcus
    FASEB JOURNAL, 2015, 29
  • [49] Assembly Checkpoint of the Proteasome is Mediated by Coordinated Actions of Proteasomal ATPase Chaperones
    Park, Soyeon
    Nahar, Asrafun
    JOURNAL OF BIOLOGICAL CHEMISTRY, 2023, 299 (03) : S561 - S561
  • [50] Rollback-recovery algorithm based on the checkpoint dependency graph and the property table
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2001, 38 (02):