Design of new roll-forward recovery approach for distributed systems

被引:5
|
作者
Gupta, B [1 ]
Banerjee, SK
Liu, B
机构
[1] So Illinois Univ, Dept Comp Sci, Carbondale, IL 62901 USA
[2] Univ Calcutta, Dept Comp Sci, Kolkata 700009, W Bengal, India
来源
关键词
D O I
10.1049/ip-cdt:20020410
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A new roll-forward checkpointing scheme is proposed using basic checkpoints. The direct-dependency concept used in the communication-induced checkpointing scheme is applied to basic checkpoints to design a simple algorithm to find a consistent global checkpoint. Both blocking (i.e. when the application processes are suspended during the execution of the algorithm) and non-blocking approaches are presented. The use of the concept of forced checkpoints ensures a small re-execution time after recovery from a failure. The proposed approaches enjoy the main advantages of both the synchronous and the asynchronous approaches, i.e. simple recovery and simple way to create checkpoints. Besides, in the proposed blocking approach. the direct-dependency concept is implemented without piggybacking any extra information with the application message. A very simple scheme for avoiding the creation of useless checkpoints is also proposed.
引用
收藏
页码:105 / 112
页数:8
相关论文
共 50 条
  • [1] A quasi-synchronous approach for roll-forward recovery in distributed systems
    Liu, H
    Shen, L
    Gu, M
    Gupta, B
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2117 - 2122
  • [2] A hybrid roll-forward recovery scheme for distributed systems
    Gupta, B
    Mogharreban, N
    Banerjee, SK
    [J]. PDPTA'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, 2001, : 48 - 54
  • [3] Novel low-overhead roll-forward recovery scheme for distributed systems
    Gupta, B.
    Rahimi, S.
    Liu, Z.
    [J]. IET COMPUTERS AND DIGITAL TECHNIQUES, 2007, 1 (04): : 397 - 404
  • [4] A New Roll-Forward Checkpointing / Recovery Mechanism for Cluster Federation
    Gupta, B.
    Rahimi, S.
    Ahmad, R.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2006, 6 (11): : 292 - 298
  • [5] Roll-forward error recovery in embedded real-time systems
    Xu, J
    Randell, B
    [J]. 1996 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 1996, : 414 - 421
  • [6] Adaptive control in roll-forward recovery for extreme scale multigrid
    Huber, Markus
    Ruede, Ulrich
    Wohlmuth, Barbara
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2019, 33 (05): : 817 - 837
  • [7] Novel lockstep-based fault mitigation approach for SoCs with roll-back and roll-forward recovery
    Kasap, Server
    Wachter, Eduardo Weber
    Zhai, Xiaojun
    Ehsan, Shoaib
    McDonald-Maier, Klaus D.
    [J]. MICROELECTRONICS RELIABILITY, 2021, 124
  • [8] Towards Predicting the Impact of Roll-Forward Failure Recovery for HPC Applications
    Fang, Bo
    Chen, Jieyang
    Pattabiraman, Karthik
    Ripeanu, Matei
    Krishnamoorthy, Sriram
    [J]. 2019 49TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS - SUPPLEMENTAL VOL (DSN-S), 2019, : 13 - 14
  • [9] Roll-forward and rollback recovery: Performance-reliability trade-off
    Pradhan, DK
    Vaidya, NH
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1997, 46 (03) : 372 - 378
  • [10] Performance evaluation of parallel systems employing roll-forward checkpoint schemes
    Park, Gyung-Leen
    Youn, Hee Yong
    Lee, Junghoon
    Kim, Chul Soo
    Lee, Bongkyu
    Lee, Sang Joon
    Song, Wang-Cheol
    Byun, Yung-Cheol
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 5, 2006, 3984 : 185 - 191