Compiler-Assisted Application-Level Checkpointing for MPI Programs

被引:3
|
作者
Yang, Xuejun [1 ]
Wang, Panfeng [1 ]
Fu, Hongyi [1 ]
Du, Yunfei [1 ]
Wang, Zhiyuan [1 ]
Jia, Jia [1 ]
机构
[1] Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Coll Comp, Changsha, Hunan, Peoples R China
关键词
D O I
10.1109/ICDCS.2008.25
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Application-level checkpointing can decrease the overhead of fault tolerance by minimizing the amount of checkpoint data. However this technique requires the programmer to manually choose the critical data that should be saved. In this paper, we firstly propose a live-variable analysis method for MPI programs. Then, we provide an optimization method of data saving for application-level checkpointing based on the analysis method. Based on the theoretical foundation, we implement a source-to-source pre-compiler (ALEC) to automate application-level checkpointing. Finally, we evaluate the performance we of five FORTRAN/MPI programs which are transformed and integrated checkpointing features by ALEC on a 512-CPU cluster system. The experimental results show that i)the application-level checkpointing based on live-variable analysis for MPI programs can efficiently reduce the amount of checkpoint data, thereby decrease the overhead of checkpoint and restart; ii)ALEC is capable of automating application-level check-pointing correctly and effectively.
引用
收藏
页码:251 / 259
页数:9
相关论文
共 50 条
  • [11] Resilient MPI applications using an application-level checkpointing framework and ULFM
    Losada, Nuria
    Cores, Ivan
    Martin, Maria J.
    Gonzalez, Patricia
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (01): : 100 - 113
  • [12] Resilient MPI applications using an application-level checkpointing framework and ULFM
    Nuria Losada
    Iván Cores
    María J. Martín
    Patricia González
    The Journal of Supercomputing, 2017, 73 : 100 - 113
  • [13] Compiler-Assisted Overlapping of Communication and Computation in MPI Applications
    Guo, Jichi
    Yi, Qing
    Meng, Jiayuan
    Zhang, Junchao
    Balaji, Pavan
    2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 60 - 69
  • [14] Compiler-Assisted Checkpointing of Parallel Codes: The Cetus and LLVM Experience
    Rodriguez, Gabriel
    Martin, Maria J.
    Gonzalez, Patricia
    Tourino, Juan
    Doallo, Ramon
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2013, 41 (06) : 782 - 805
  • [15] Local rollback for resilient MPI applications with application-level checkpointing and message logging
    Losada, Nuria
    Bosilca, George
    Bouteiller, Aurelien
    Gonzalez, Patricia
    Martin, Maria J.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 91 : 450 - 464
  • [16] Compiler-Assisted Checkpointing of Parallel Codes: The Cetus and LLVM Experience
    Gabriel Rodríguez
    María J. Martín
    Patricia González
    Juan Touriño
    Ramón Doallo
    International Journal of Parallel Programming, 2013, 41 : 782 - 805
  • [17] Compiler-assisted thread level control speculation
    Miura, H
    Hung, LD
    Iwama, C
    Tashiro, D
    Barli, ND
    Sakai, S
    Tanaka, H
    EURO-PAR 2003 PARALLEL PROCESSING, PROCEEDINGS, 2003, 2790 : 603 - 608
  • [18] An application-level checkpointing based on extended data flow analysis for OpenMP programs
    Fu H.-Y.
    Ding Y.
    Song W.
    Yang X.-J.
    Jisuanji Xuebao/Chinese Journal of Computers, 2010, 33 (10): : 1809 - 1822
  • [19] CPPC: a compiler-assisted tool for portable checkpointing of message-passing applications
    Rodriguez, Gabriel
    Martin, Maria J.
    Gonzalez, Patricia
    Tourino, Juan
    Doallo, Ramon
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2010, 22 (06): : 749 - 766
  • [20] ITALC: Interactive Tool for Application-Level Checkpointing
    Arora, Ritu
    Trung Nguyen Ba
    HUST'17: PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON HPC USER SUPPORT TOOLS, 2017,