An index-based checkpointing algorithm for autonomous distributed systems

被引:41
|
作者
Baldoni, R [1 ]
Quaglia, F [1 ]
Fornara, P [1 ]
机构
[1] Univ Roma La Sapienza, Dipartimento Informat & Sistemist, I-00198 Rome, Italy
关键词
checkpointing; causal dependency; protocols; timestamp management; global snapshot; fault tolerance; rollback-recovery; distributed systems; performance evaluation;
D O I
10.1109/71.752783
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents an index-based checkpointing algorithm for distributed systems with the aim of reducing the total number of checkpoints while ensuring that each checkpoint belongs to at least one consistent global checkpoint (or recovery line). The algorithm is based on an equivalence relation defined between pairs of successive checkpoints of a process which allows us, in some cases, to advance the recovery line of the computation without forcing checkpoints in other processes. The algorithm is well-suited for autonomous and heterogeneous environments, where each process does not know any private information about other processes and private information of the same type of distinct processes is not related (e.g., clock granularity, local checkpointing strategy, etc.). We also present a simulation study which compares the checkpointing-recovery overhead of this algorithm to the ones of previous solutions.
引用
收藏
页码:181 / 192
页数:12
相关论文
共 50 条
  • [1] An index-based checkpointing algorithm for autonomous distributed systems
    Baldoni, R
    Quaglia, F
    Fornara, P
    [J]. SIXTEENTH SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 1997, : 27 - 34
  • [2] An index-based checkpointing/recovery approach for distributed systems
    Gupta, B
    Banerjee, SK
    Wang, Z
    [J]. COMPUTERS AND THEIR APPLICATIONS, 2001, : 166 - 170
  • [3] An Index-Based Mobile Checkpointing and Recovery Algorithm
    Singh, Awadhesh Kumar
    Bhat, Rohit
    Kumar, Anshul
    [J]. DISTRIBUTED COMPUTING AND NETWORKING, 2009, 5408 : 200 - 205
  • [4] An improved scheme of index-based checkpointing
    Luo, YS
    Min, YH
    Zhang, DF
    [J]. 11TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2005, : 167 - 171
  • [5] Performance comparisons of index-based communication-induced checkpointing protocols
    Tsai, Jichiang
    [J]. JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2006, 29 (06) : 1113 - 1118
  • [6] An efficient and scalable checkpointing and recovery algorithm for distributed systems
    Kumar, K. P. Krishna
    Hansdah, R. C.
    [J]. DISTRIBUTED COMPUTING AND NETWORKING, PROCEEDINGS, 2006, 4308 : 94 - 99
  • [7] Design and analysis of an efficient algorithm for coordinated checkpointing in distributed systems
    Cao, JN
    Jia, WJ
    Jia, XH
    Cheung, TY
    [J]. ADVANCES IN PARALLEL AND DISTRIBUTED COMPUTING - PROCEEDINGS, 1997, : 261 - 268
  • [8] A Scalable Communication-Induced Checkpointing Algorithm for Distributed Systems
    Simon, Alberto Calixto
    Hernandez, Saul E. Pomares
    Cruz, Jose Roberto Perez
    Gomez-Gil, Pilar
    Drira, Khalil
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (04) : 886 - 896
  • [9] An efficient index-based checkpointing protocol with constant-size control information on messages
    Tsai, JC
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2005, 2 (04) : 287 - 296
  • [10] Index-based query processing on distributed multidimensional data
    Tsatsanifos, George
    Sacharidis, Dimitris
    Sellis, Timos
    [J]. GEOINFORMATICA, 2013, 17 (03) : 489 - 519