Pseudo-synchronous rollforward recovery approach using basic checkpoints for distributed systems

被引:0
|
作者
Gupta, B [1 ]
Banerjee, SK [1 ]
Liu, B [1 ]
机构
[1] So Illinois Univ, Dept Comp Sci, Carbondale, IL 62901 USA
关键词
synchronous checkpointing; asynchronous checkpointing; forced checkpoints; recovery;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a pseudo-synchronous approach for checkpointing/recovery is proposed using only basic checkpoints. The direct-dependency concept used in communication-induced approaches has been applied to basic checkpoints to design a simple algorithm to find a consistent global checkpoint. Also, the use of the concept of forced checkpoints ensures a small re-execution time after recovery from a failure. The proposed approach enjoys the advantages of both synchronous and asynchronous approaches, i.e. simple recovery and simple way to create checkpoints. Besides, direct-dependency concept is implemented without piggybacking any extra information with the application message.
引用
收藏
页码:476 / 480
页数:5
相关论文
共 50 条
  • [1] Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems
    Herault, Thomas
    Bouteiller, Aurelien
    Bosilca, George
    Gamell, Marc
    Teranishi, Keita
    Parashar, Manish
    Dongarra, Jack
    [J]. PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
  • [2] Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update Method
    Wen, Yingpeng
    Qiu, Zhilin
    Zhang, Dongyu
    Huang, Dan
    Xiao, Nong
    Lin, Liang
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2024, 52 (03) : 125 - 146
  • [3] An efficient communication induced rollforward checkpointing and recovery protocol for distributed systems
    Gu, MM
    Zeng, L
    Liang, ZH
    Gupta, B
    [J]. COMPUTERS AND THEIR APPLICATIONS, 2000, : 298 - 302
  • [4] Modeling and Stability Analysis of HVDC Auxiliary Control for Pseudo-synchronous Operation of Asynchronous Interconnected Systems
    Pi, Jieming
    Zhang, Kun
    Chen, Yiping
    Li, Chongtao
    [J]. Dianli Xitong Zidonghua/Automation of Electric Power Systems, 2023, 47 (14): : 82 - 92
  • [5] A quasi-synchronous approach for roll-forward recovery in distributed systems
    Liu, H
    Shen, L
    Gu, M
    Gupta, B
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2117 - 2122
  • [6] USING CHECKPOINTS TO LOCALIZE THE EFFECTS OF FAULTS IN DISTRIBUTED SYSTEMS
    AHAMAD, M
    LIN, L
    [J]. PROCEEDINGS OF THE EIGHTH SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, 1989, : 2 - 11
  • [7] Quasi-synchronous approach for distributed control in synchronous systems
    Yeddes, M
    Mullins, J
    [J]. PROCEEDINGS OF THE 2001 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL (ISIC'01), 2001, : 231 - 235
  • [8] A new way of calculating the recovery line through eliminating useless checkpoints in distributed systems
    Pourmahmoud, Solmaz
    Asbaghi, Shabnam
    Haghighat, AbolfazI T.
    [J]. 23RD INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2008, : 76 - +
  • [9] The Signal Synchronous Multiclock Approach to the Design of Distributed Embedded Systems
    Gamatie, Abdoulaye
    Gautier, Thierry
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2010, 21 (05) : 641 - 657
  • [10] Using conditions to expedite consensus in synchronous distributed systems
    Mostefaoui, A
    Rajsbaum, S
    Raynal, M
    [J]. DISTRIBUTED COMPUTING, PROCEEDINGS, 2003, 2848 : 249 - 263