A technique for non-invasive application-level checkpointing

被引:8
|
作者
Arora, Ritu [1 ]
Bangalore, Purushotham [1 ]
Mernik, Marjan [1 ,2 ]
机构
[1] Univ Alabama Birmingham, Dept Comp & Informat Sci, Birmingham, AL 35294 USA
[2] Univ Maribor, Fac Elect Engn & Comp Sci, SLO-2000 Maribor, Slovenia
来源
JOURNAL OF SUPERCOMPUTING | 2011年 / 57卷 / 03期
基金
美国国家科学基金会;
关键词
Fault-tolerance; Application-level checkpointing; Domain-specific language; PARALLEL;
D O I
10.1007/s11227-010-0383-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
One of the key elements required for writing self-healing applications for distributed and dynamic computing environments is checkpointing. Checkpointing is a mechanism by which an application is made resilient to failures by storing its state periodically to the disk. The main goal of this research is to enable non-invasive reengineering of existing applications to insert Application-Level Checkpointing (ALC) mechanism. The Domain-Specific Language (DSL) developed in this research serves as a perfect means towards this end and is used for obtaining the ALC-specifications from the end-users. These specifications are used for generating and inserting the actual checkpointing code into the existing application. The performance of the application having the generated checkpointing code is comparable to the performance of the application in which the checkpointing code was inserted manually. With slight modifications, the DSL developed in this research can be used for specifying the ALC mechanism in several base languages (e.g., C/C++, Java, and FORTRAN).
引用
收藏
页码:227 / 255
页数:29
相关论文
共 50 条
  • [1] A technique for non-invasive application-level checkpointing
    Ritu Arora
    Purushotham Bangalore
    Marjan Mernik
    The Journal of Supercomputing, 2011, 57 : 227 - 255
  • [2] Automated application-level checkpointing of MPI programs
    Bronevetsky, G
    Marques, D
    Pingali, K
    Stodghill, P
    ACM SIGPLAN NOTICES, 2003, 38 (10) : 84 - 94
  • [3] Application-level checkpointing for shared memory programs
    Bronevetsky, G
    Marques, D
    Pingali, K
    Szwed, P
    Schulz, M
    ACM SIGPLAN NOTICES, 2004, 39 (11) : 235 - 247
  • [4] ITALC: Interactive Tool for Application-Level Checkpointing
    Arora, Ritu
    Trung Nguyen Ba
    HUST'17: PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON HPC USER SUPPORT TOOLS, 2017,
  • [5] Application-level checkpointing techniques for parallel programs
    Walters, John Paul
    Chaudhary, Vipin
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2006, 4317 : 221 - +
  • [6] Checkpointing RSIP applications at application-level in ChinaGrid
    Li, CJ
    Yang, XJ
    Xiao, N
    Current Trends in High Performance Computing and Its Applications, Proceedings, 2005, : 351 - 356
  • [7] System-Level vs. Application-Level Checkpointing
    Posner, Jonas
    2020 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2020), 2020, : 404 - 405
  • [8] A Domain-Specific Language for Application-Level Checkpointing
    Arora, Ritu
    Mernik, Marjan
    Bangalore, Purushotham
    Roychoudhury, Suman
    Mukkai, Saraswathi
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2008, 5375 : 26 - 38
  • [9] Adaptation strategies for application-level computation migration/checkpointing
    Ji, YQ
    Jiang, H
    Chaudhary, V
    PDPTA '05: PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-3, 2005, : 1156 - 1162
  • [10] Performance evaluation of an application-level checkpointing solution on grids
    Rodriguez, Gabriel
    Pardo, Xoan C.
    Martin, Maria J.
    Gonzalez, Patricia
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2010, 26 (07): : 1012 - 1023