A New Roll-Forward Checkpointing / Recovery Mechanism for Cluster Federation

被引:0
|
作者
Gupta, B. [1 ]
Rahimi, S. [1 ]
Ahmad, R. [1 ]
机构
[1] Southern Illinois Univ, Dept Comp Sci, Carbondale, IL 62901 USA
关键词
Cluster Computing; Checkpoints; Recovery;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we have addressed the complex problem of determining a recovery line for cluster federation and proposed an efficient checkpointing / recovery mechanism for it. The main objective of the proposed approach is to advance the recovery line in a cluster federation such that we can put a limit on the amount of rollback by the processes in all the clusters in case of failure(s) in the cluster federation; thereby in the worst case only limited domino effect is allowed in our work. In this approach, processes in different clusters are able to perform their responsibility independently and simultaneously. This inherent parallelism of the algorithm contributes to its speed of execution. We have shown that the proposed approach is superior to the existing works, because neither it suffers from any message storm, nor it takes any unnecessary checkpoints.
引用
收藏
页码:292 / 298
页数:7
相关论文
共 50 条
  • [1] Design of new roll-forward recovery approach for distributed systems
    Gupta, B
    Banerjee, SK
    Liu, B
    [J]. IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES, 2002, 149 (03): : 105 - 112
  • [2] ROLL-FORWARD CHECKPOINTING SCHEME - A NOVEL FAULT-TOLERANT ARCHITECTURE
    PRADHAN, DK
    VAIDYA, NH
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1994, 43 (10) : 1163 - 1174
  • [3] A hybrid roll-forward recovery scheme for distributed systems
    Gupta, B
    Mogharreban, N
    Banerjee, SK
    [J]. PDPTA'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, 2001, : 48 - 54
  • [4] Adaptive control in roll-forward recovery for extreme scale multigrid
    Huber, Markus
    Ruede, Ulrich
    Wohlmuth, Barbara
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2019, 33 (05): : 817 - 837
  • [5] Roll-forward error recovery in embedded real-time systems
    Xu, J
    Randell, B
    [J]. 1996 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 1996, : 414 - 421
  • [6] A quasi-synchronous approach for roll-forward recovery in distributed systems
    Liu, H
    Shen, L
    Gu, M
    Gupta, B
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2117 - 2122
  • [7] Towards Predicting the Impact of Roll-Forward Failure Recovery for HPC Applications
    Fang, Bo
    Chen, Jieyang
    Pattabiraman, Karthik
    Ripeanu, Matei
    Krishnamoorthy, Sriram
    [J]. 2019 49TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS - SUPPLEMENTAL VOL (DSN-S), 2019, : 13 - 14
  • [8] Roll-forward and rollback recovery: Performance-reliability trade-off
    Pradhan, DK
    Vaidya, NH
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1997, 46 (03) : 372 - 378
  • [9] Novel low-overhead roll-forward recovery scheme for distributed systems
    Gupta, B.
    Rahimi, S.
    Liu, Z.
    [J]. IET COMPUTERS AND DIGITAL TECHNIQUES, 2007, 1 (04): : 397 - 404
  • [10] Improvements to a Roll-Back Mechanism for Asynchronous Checkpointing and Recovery
    Kapus-Kolar, Monika
    [J]. INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2009, 33 (04): : 511 - 519