EFFICIENT CHECKPOINTING PROCEDURES FOR FAULT-TOLERANT DISTRIBUTED SYSTEMS

被引:0
|
作者
SALEH, K
AGARWAL, A
机构
[1] KUWAIT UNIV,DEPT ELECT & COMP ENGN,SAFAT 13060,KUWAIT
[2] CONCORDIA UNIV,DEPT ELECT & COMP ENGN,MONTREAL H3G 1M8,QUEBEC,CANADA
[3] UNIV ROORKEE,DEPT ELECTR & COMP ENGN,ROORKEE 247667,UTTAR PRADESH,INDIA
来源
MICROPROCESSING AND MICROPROGRAMMING | 1994年 / 40卷 / 06期
关键词
CHECKPOINTING; DISTRIBUTED SYSTEMS; FAULT TOLERANCE; ROLLBACK RECOVERY; SYSTEM STATE;
D O I
10.1016/0165-6074(94)90107-4
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A classical approach for achieving fault tolerance in distributed systems is based on the incorporation of efficient and fault tolerant procedures for checkpointing and recovery in such systems. We propose two checkpointing procedures, which can be initiated by any process in the system or upon failure of one or more component processes. Our procedures return the most recent and consistent checkpoints for the processes initiating the procedure, and do not interfere with the progress of the distributed system application. Furthermore, our procedures guarantee that a consistent checkpoint will be obtained when they terminate. Examples illustrating the application of the procedures are also provided.
引用
收藏
页码:427 / 438
页数:12
相关论文
共 50 条
  • [1] Application controlled checkpointing coordination for fault-tolerant distributed computing systems
    Park, T
    Yeom, HY
    PARALLEL COMPUTING, 2000, 26 (04) : 467 - 482
  • [2] Communication pattern based checkpointing coordination for fault-tolerant distributed computing systems
    Park, T
    Yeom, HY
    TWELFTH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN-12), PROCEEDINGS, 1998, : 559 - 562
  • [3] Performance and effectiveness trade-off for checkpointing in fault-tolerant distributed systems
    Katsaros, Panagiotis
    Angelis, Lefteris
    Lazos, Constantine
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2007, 19 (01): : 37 - 63
  • [4] On Closed Nesting and Checkpointing in Fault-Tolerant Distributed Transactional Memory
    Dhoke, Aditya
    Ravindran, Binoy
    Zhang, Bo
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 41 - 52
  • [5] Synthesis of fault-tolerant embedded systems with checkpointing and replication
    Izosimov, V
    Pop, P
    Eles, P
    Peng, Z
    DELTA 2006: THIRD IEEE INTERNATIONAL WORKSHOP ON ELECTRONIC DESIGN, TEST AND APPLICATIONS, 2006, : 440 - +
  • [6] A Fault-Tolerant Scheduling Algorithm Based on Checkpointing and Redundancy for Distributed Real-Time Systems
    Kada, Barkahoum
    Kalla, Hamoudi
    INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2019, 10 (03) : 58 - 75
  • [7] UNDERSTANDING FAULT-TOLERANT DISTRIBUTED SYSTEMS
    CRISTIAN, F
    COMMUNICATIONS OF THE ACM, 1991, 34 (02) : 56 - 78
  • [8] Fault-tolerant Distributed Systems in Hardware
    Schmid, Stefan
    BULLETIN OF THE EUROPEAN ASSOCIATION FOR THEORETICAL COMPUTER SCIENCE, 2015, 2015 (116): : 111 - 153
  • [9] Adaptive distributed and fault-tolerant systems
    Hiltunen, MA
    Schlichting, RD
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 1996, 11 (05): : 275 - 285
  • [10] Synthesis of Fault-Tolerant Distributed Systems
    Dimitrova, Rayna
    Finkbeiner, Bernd
    AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, PROCEEDINGS, 2009, 5799 : 321 - 336