A causal message logging protocol with asynchronous checkpointing for distributed systems

被引:0
|
作者
Ahn, J [1 ]
Kim, K [1 ]
Hwang, C [1 ]
机构
[1] Korea Univ, Dept Comp Sci & Engn, Seoul 136701, South Korea
关键词
distributed systems; fault-tolerance; asynchronous checkpointing; causal message logging; recovery;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Causal message logging is an efficient approach for tolerating failures of processes in distributed systems because it has the advantages of both pessimistic and optimistic message logging approach. However, traditional causal message logging protocols prevent live processes from executing continuously their computation and require some synchronous logging to the stable storage during recovery. Although Elnozahy protocol solves the problems, it has the central recovery leader problem. Additionally, if it were integrated with asynchronous checkpointing, it may result in inconsistency problems in case of concurrent failures. In this paper we present a new causal message logging protocol with asynchronous checkpointing to need to maintain only the latest checkpoint of each process and allow live processes to execute continuously their computation even in concurrent failures during recovery. Moreover the protocol solves the problems of Elnozahy protocol and improves asynchrony during recovery because the protocol enables each recovering process to be responsible for only its recovery.
引用
收藏
页码:523 / 528
页数:6
相关论文
共 50 条