Static analysis for application-level checkpointing of MPI programs

被引:0
|
作者
Wang, Panfeng [1 ]
Du, Yunfei [1 ]
Fu, Hongyi [1 ]
Yang, Xuejun [1 ]
Zhou, Haifang [1 ]
机构
[1] Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Coll Comp, Changsha 410073, Hunan, Peoples R China
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Application-level checkpointing is a promising technology in the domain of large-scale scientific computing. The consistency of global checkpoint must be carefully guaranteed in order to correctly restore the computation. Usually, some complex coordinated protocols are employed to ensure the consistency of global checkpoint, which require logging orphan or in-transit messages during checkpointing. These protocols complicate the recovery of the computation and increase the checkpoint overhead due to logging message. In this paper, a new method which ensures the consistency of global checkpoint by static analysis is proposed. The method identifies the safe checkpointing regions in MPI programs, which the global checkpoint is always strongly consistent. All checkpoint are located in those safe checkpoint regions. During checkpointing, the method will not log any messages and introduce no extra overhead. The method was implemented and integrated into ALEC, which is a source-to-source precompilere for automating application-level checkpointing. The experimental results show that our method is effective.
引用
收藏
页码:548 / 555
页数:8
相关论文
共 50 条
  • [1] Automated application-level checkpointing of MPI programs
    Bronevetsky, G
    Marques, D
    Pingali, K
    Stodghill, P
    [J]. ACM SIGPLAN NOTICES, 2003, 38 (10) : 84 - 94
  • [2] Compiler-Assisted Application-Level Checkpointing for MPI Programs
    Yang, Xuejun
    Wang, Panfeng
    Fu, Hongyi
    Du, Yunfei
    Wang, Zhiyuan
    Jia, Jia
    [J]. 28TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, VOLS 1 AND 2, PROCEEDINGS, 2008, : 251 - 259
  • [3] Automated Application-Level Checkpointing Based on Live-variable Analysis in MPI Programs
    Wang, Panfeng
    Yang, Xuejun
    Fu, Hongyi
    Du, Yunfei
    Wang, Zhiyuan
    Jia, Jia
    [J]. PPOPP'08: PROCEEDINGS OF THE 2008 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2008, : 273 - 274
  • [4] C3:: A system for automating application-level checkpointing of MPI programs
    Bronevetsky, G
    Marques, D
    Pingali, K
    Stodghill, P
    [J]. LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2004, 2958 : 357 - 373
  • [5] WBC-ALC: A Weak Blocking Coordinated Application-Level Checkpointing for MPI Programs
    Xu, Xinhai
    Yang, Xuejun
    Lin, Yufei
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (03): : 786 - 796
  • [6] Application-level checkpointing for shared memory programs
    Bronevetsky, G
    Marques, D
    Pingali, K
    Szwed, P
    Schulz, M
    [J]. ACM SIGPLAN NOTICES, 2004, 39 (11) : 235 - 247
  • [7] Application-level checkpointing techniques for parallel programs
    Walters, John Paul
    Chaudhary, Vipin
    [J]. DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2006, 4317 : 221 - +
  • [8] Portable Application-level Checkpointing for Hybrid MPI-OpenMP Applications
    Losada, Nuria
    Martin, Maria J.
    Rodriguez, Gabriel
    Gonzalez, Patricia
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 19 - 29
  • [9] Resilient MPI applications using an application-level checkpointing framework and ULFM
    Nuria Losada
    Iván Cores
    María J. Martín
    Patricia González
    [J]. The Journal of Supercomputing, 2017, 73 : 100 - 113
  • [10] Resilient MPI applications using an application-level checkpointing framework and ULFM
    Losada, Nuria
    Cores, Ivan
    Martin, Maria J.
    Gonzalez, Patricia
    [J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (01): : 100 - 113