Communication-induced determination of consistent snapshots

被引:28
|
作者
Hélary, JM [1 ]
Mostefaoui, A [1 ]
Raynal, M [1 ]
机构
[1] Inst Rech Informat & Syst Aleatoires, F-35042 Rennes, France
关键词
asynchronous distributed computation; checkpointing; communication-induced protocol; consistency; global checkpoint; message recording; snapshot;
D O I
10.1109/71.798312
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A classical way to determine consistent snapshots consists in using Chandy-Lamport's algorithm. This algorithm relies on specific control messages that allow processes to synchronize local checkpoint determination and message recording in order for the resulting snapshot to be consistent. This paper investigates a communication-induced approach to determine consistent snapshots. In such an approach, control information is carried out by application messages. Two abstract necessary and sufficient conditions are stated: one associated with global checkpoint consistency, the other associated with message recording. A general protocol is derived from these abstract conditions. Actually, this general protocol can be instantiated in distinct ways, giving rise to a family of communication-induced snapshot protocols. This general protocol shows there is an intrinsic trade-off between the number of forced checkpoints and the number of recorded messages. Finally, a particular instantiation of the general protocol is provided.
引用
收藏
页码:865 / 877
页数:13
相关论文
共 50 条