Performance evaluation of automatic checkpoint-based fault tolerance for AMPI and charm

被引:5
|
作者
Department of Computer Science, University of Illinois at Urbana-Champaign [1 ]
机构
来源
Oper Syst Rev ACM | 2006年 / 2卷 / 90-99期
关键词
Fault tolerant computer systems;
D O I
10.1145/1131322.1131340
中图分类号
学科分类号
摘要
引用
收藏
相关论文
共 50 条
  • [1] Checkpoint-based Fault-tolerance for LEACH Protocol
    Lehsaini, Mohamed
    Guyennet, Herve
    2014 6TH INTERNATIONAL CONFERENCE ON NEW TECHNOLOGIES, MOBILITY AND SECURITY (NTMS), 2014,
  • [2] FTC-Charm++:: An in-memory checkpoint-based fault tolerant runtime for Charm plus plus and MPI
    Zheng, GB
    Shi, LX
    Kalé, LV
    2004 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2004, : 93 - 103
  • [3] Optimizing checkpoint-based fault-tolerance in distributed stream processing systems: Theory to practice
    Jayasekara, Sachini
    Karunasekera, Shanika
    Harwood, Aaron
    SOFTWARE-PRACTICE & EXPERIENCE, 2022, 52 (01): : 296 - 315
  • [4] Checkpoint-based Fault-tolerant Infrastructure for Virtualized Service Providers
    Goiri, Inigo
    Julia, Ferran
    Guitart, Jordi
    Torres, Jordi
    PROCEEDINGS OF THE 2010 IEEE-IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2010, : 455 - 462
  • [5] CHARM: A Checkpoint-based Resource Management Framework for Reliable Multicore Computing in the Dark Silicon Era
    Raparti, Venkata Yaswanth
    Kapadia, Nishit
    Pasricha, Sudeep
    PROCEEDINGS OF THE 34TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2016, : 201 - 208
  • [6] CHIME: A Checkpoint-Based Approach to Improving the Performance of Shared Clusters
    Shao, Yiyang
    Zhu, Xiaomin
    Bao, Weidong
    Zhou, Wen
    Xiao, Wenhua
    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 1007 - 1014
  • [7] A Unit-based Checkpoint Algorithm Supporting Fault Tolerance
    Li, Hong-liang
    2013 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (ICCSAI 2013), 2013, : 381 - 385
  • [8] CRAFT: A Library for Easier Application-Level Checkpoint/Restart and Automatic Fault Tolerance
    Shahzad, Faisal
    Thies, Jonas
    Kreutzer, Moritz
    Zeiser, Thomas
    Hager, Georg
    Wellein, Gerhard
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (03) : 501 - 514
  • [9] Automatic Checkpointing based Fault Tolerance in Computational Grid
    Babu, Ch. Ramesh
    Rao, Ch. D. V. Subba
    2014 INTERNATIONAL CONFERENCE ON COMPUTING, MANAGEMENT AND TELECOMMUNICATIONS (COMMANTEL), 2014, : 41 - 45
  • [10] Fault tolerance evaluation for component based models
    Dion, J. M.
    2012 2ND INTERNATIONAL CONFERENCE ON COMMUNICATIONS, COMPUTING AND CONTROL APPLICATIONS (CCCA), 2012,