Portable Application-level Checkpointing for Hybrid MPI-OpenMP Applications

被引:6
|
作者
Losada, Nuria [1 ]
Martin, Maria J. [1 ]
Rodriguez, Gabriel [1 ]
Gonzalez, Patricia [1 ]
机构
[1] Univ A Coruna, Grp Arquitectura Comp, La Coruna, Spain
关键词
Multicore Clusters; Hybrid MPI-OpenMP; Fault Tolerance; Checkpointing;
D O I
10.1016/j.procs.2016.05.294
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As parallel machines increase their number of processors, so does the failure rate of the global system, thus, long-running applications will need to make use of fault tolerance techniques to ensure the successful execution completion. Most of current HPC systems are built as clusters of multicores. The hybrid MPI-OpenMP paradigm provides numerous benefits on these systems. This paper presents a checkpointing solution for hybrid MPI-OpenMP applications, in which checkpoint consistency is guaranteed by using a coordination protocol intra-node, while no internode coordination is needed. The proposal reduces network utilization and storage resources in order to optimize the I/O cost of fault tolerance, while minimizing the checkpointing overhead. Besides, the portability of the solution and the dynamic parallelism provided by OpenMP enable the restart of the applications using machines with different architectures, operating systems and/or number of cores, adapting the number of running OpenMP threads for the best exploitation of the available resources. Extensive evaluation using hybrid MPI-OpenMP applications from the ASC Sequoia Benchmark Codes and NERSC-8/Trinity benchmarks is presented, showing the effectiveness and efficiency of the approach.
引用
收藏
页码:19 / 29
页数:11
相关论文
共 50 条
  • [41] A technique for non-invasive application-level checkpointing
    Ritu Arora
    Purushotham Bangalore
    Marjan Mernik
    [J]. The Journal of Supercomputing, 2011, 57 : 227 - 255
  • [42] An MPI-OpenMP Hybrid Parallel H-LU Direct Solver for Electromagnetic Integral Equations
    Guo, Han
    Hu, Jun
    Nie, Zaiping
    [J]. INTERNATIONAL JOURNAL OF ANTENNAS AND PROPAGATION, 2015, 2015
  • [43] Enhanced Hybrid MPI-OpenMP Parallel Electromagnetic Simulations Based on Low-Rank Compressions
    Wang, Xiren
    Jandhyala, Vikram
    [J]. 2008 IEEE INTERNATIONAL SYMPOSIUM ON ELECTROMAGNETIC COMPATIBILITY, VOLS 1-3, 2008, : 803 - 807
  • [44] MPI Thread-Level Checking for MPI plus OpenMP Applications
    Saillard, Emmanuelle
    Carribault, Patrick
    Barthou, Denis
    [J]. EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 31 - 42
  • [45] Relieving the Effects of Uncertainty in Forest Fire Spread Prediction by Hybrid MPI-OpenMP Parallel Strategies
    Artes, Tomas
    Cencerrado, Andres
    Cortes, Ana
    Margalef, Tomas
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 2278 - 2287
  • [46] A Hybrid MPI-OpenMP Strategy to Speedup the Compression of Big Next-Generation Sequencing Datasets
    Vargas-Perez, Sandino
    Saeed, Fahad
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (10) : 2760 - 2769
  • [47] Automatic performance analysis of hybrid MPI/OpenMP applications
    Wolf, F
    Mohr, B
    [J]. ELEVENTH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS, 2003, : 13 - 22
  • [48] Automatic performance analysis of hybrid MPI/OpenMP applications
    Wolf, F
    Mohr, B
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2003, 49 (10-11) : 421 - 439
  • [49] Runtime Interval Optimization and Dependable Performance for Application-Level Checkpointing
    Kokolis, Apostolos
    Mavrogiannis, Alexandros
    Rodopoulos, Dimitrios
    Strydis, Christos
    Soudris, Dimitrios
    [J]. PROCEEDINGS OF THE 2016 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2016, : 594 - 599
  • [50] Performance simulation of a hybrid OpenMP/MPI application with HeSSE
    Aversa, R
    Di Martino, B
    Rak, M
    Venticinque, S
    Villano, U
    [J]. PARALLEL COMPUTING: SOFTWARE TECHNOLOGY, ALGORITHMS, ARCHITECTURES AND APPLICATIONS, 2004, 13 : 803 - 810