Compiler-Assisted Application-Level Checkpointing for MPI Programs

被引:3
|
作者
Yang, Xuejun [1 ]
Wang, Panfeng [1 ]
Fu, Hongyi [1 ]
Du, Yunfei [1 ]
Wang, Zhiyuan [1 ]
Jia, Jia [1 ]
机构
[1] Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Coll Comp, Changsha, Hunan, Peoples R China
关键词
D O I
10.1109/ICDCS.2008.25
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Application-level checkpointing can decrease the overhead of fault tolerance by minimizing the amount of checkpoint data. However this technique requires the programmer to manually choose the critical data that should be saved. In this paper, we firstly propose a live-variable analysis method for MPI programs. Then, we provide an optimization method of data saving for application-level checkpointing based on the analysis method. Based on the theoretical foundation, we implement a source-to-source pre-compiler (ALEC) to automate application-level checkpointing. Finally, we evaluate the performance we of five FORTRAN/MPI programs which are transformed and integrated checkpointing features by ALEC on a 512-CPU cluster system. The experimental results show that i)the application-level checkpointing based on live-variable analysis for MPI programs can efficiently reduce the amount of checkpoint data, thereby decrease the overhead of checkpoint and restart; ii)ALEC is capable of automating application-level check-pointing correctly and effectively.
引用
下载
收藏
页码:251 / 259
页数:9
相关论文
共 50 条
  • [1] Automated application-level checkpointing of MPI programs
    Bronevetsky, G
    Marques, D
    Pingali, K
    Stodghill, P
    ACM SIGPLAN NOTICES, 2003, 38 (10) : 84 - 94
  • [2] Static analysis for application-level checkpointing of MPI programs
    Wang, Panfeng
    Du, Yunfei
    Fu, Hongyi
    Yang, Xuejun
    Zhou, Haifang
    HPCC 2008: 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2008, : 548 - 555
  • [3] COMPILER-ASSISTED FULL CHECKPOINTING
    LI, CCJ
    STEWART, EM
    FUCHS, WK
    SOFTWARE-PRACTICE & EXPERIENCE, 1994, 24 (10): : 871 - 886
  • [4] Compiler-assisted heterogeneous checkpointing
    Karablieh, F
    Bazzi, RA
    Hicks, M
    20TH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2001, : 56 - 65
  • [5] C3:: A system for automating application-level checkpointing of MPI programs
    Bronevetsky, G
    Marques, D
    Pingali, K
    Stodghill, P
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2004, 2958 : 357 - 373
  • [6] Automated Application-Level Checkpointing Based on Live-variable Analysis in MPI Programs
    Wang, Panfeng
    Yang, Xuejun
    Fu, Hongyi
    Du, Yunfei
    Wang, Zhiyuan
    Jia, Jia
    PPOPP'08: PROCEEDINGS OF THE 2008 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2008, : 273 - 274
  • [7] WBC-ALC: A Weak Blocking Coordinated Application-Level Checkpointing for MPI Programs
    Xu, Xinhai
    Yang, Xuejun
    Lin, Yufei
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (03): : 786 - 796
  • [8] Application-level checkpointing for shared memory programs
    Bronevetsky, G
    Marques, D
    Pingali, K
    Szwed, P
    Schulz, M
    ACM SIGPLAN NOTICES, 2004, 39 (11) : 235 - 247
  • [9] Application-level checkpointing techniques for parallel programs
    Walters, John Paul
    Chaudhary, Vipin
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2006, 4317 : 221 - +
  • [10] Portable Application-level Checkpointing for Hybrid MPI-OpenMP Applications
    Losada, Nuria
    Martin, Maria J.
    Rodriguez, Gabriel
    Gonzalez, Patricia
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 19 - 29