A Delayed Checkpoint Approach for Communication-induced Checkpointing in Autonomic Computing

被引:3
|
作者
Calixto Simon, Alberto [1 ]
Pomares Hernandez, Saul E. [1 ]
Perez Cruz, Jose Roberto [1 ]
机构
[1] INAOE, Puebla 72840, Mexico
关键词
Distributed Systems; Communication-induced checkpointing; Autonomic Computing; SYSTEMS;
D O I
10.1109/WETICE.2013.15
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Although the initiative of Autonomic Computing was introduced a dozen years ago, several challenges remain open. One of these challenges is the efficient monitoring at runtime oriented to the detection, diagnosis, and repair of problems that result from failures or bugs in software and/or hardware components. For this purpose, Communication-induced Checkpointing (CIC) can be a useful tool. Communication-induced Checkpointing has been used to attack a wide range of problems that arise in distributed systems, such as rollback recovery, software debugging and software verification, among others. In CIC algorithms, an autonomic component (process) asynchronously cooperates by exchanging information on the application messages about saved local states called checkpoints. CIC aims to form global consistent snapshots by grouping checkpoints (one by each component) in a non-coordinated way. To achieve this, CIC solutions continuously monitor the exchanged control information to identify possible dangerous checkpointing patterns. When a dangerous pattern is identified, it is broken by locally triggering a forced checkpoint. Nevertheless, as we will show, not all forced checkpoints triggered by current solutions are necessary. In this paper, we present a delayed checkpoint approach suitable for autonomic computing that reduces forced checkpoints by establishing certain triggering rules that we call safe checkpoint conditions. Finally, some results are presented which show that our proposal is more efficient than other current solutions.
引用
收藏
页码:56 / 61
页数:6
相关论文
共 50 条
  • [1] A communication-induced checkpointing algorithm using virtual checkpoint on distributed systems
    Do-Hyung, K
    Chang-Soon, P
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 2000, : 145 - 150
  • [2] A communication-induced checkpointing and asynchronous recovery protocol for mobile computing systems
    Tantikul, T
    Manivannan, D
    [J]. PDCAT 2005: SIXTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2005, : 70 - 74
  • [3] On properties of RDT communication-induced checkpointing protocols
    Tsai, JC
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2003, 14 (08) : 755 - 764
  • [4] Performance of communication-induced checkpointing algorithms
    Manivannan, D
    Zhang, C
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2003, 18 (03): : 129 - 136
  • [6] On characteristics of DEF communication-induced checkpointing protocols
    Tsai, J
    Lin, JW
    [J]. 2002 PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2002, : 29 - 36
  • [7] Systematic comparisons of RDT communication-induced checkpointing protocols
    Tsai, JC
    [J]. 10TH IEEE PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2004, : 66 - 75
  • [8] A Scalable Communication-Induced Checkpointing Algorithm for Distributed Systems
    Simon, Alberto Calixto
    Hernandez, Saul E. Pomares
    Cruz, Jose Roberto Perez
    Gomez-Gil, Pilar
    Drira, Khalil
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (04) : 886 - 896
  • [9] On the fully-informed communication-induced checkpointing protocol
    Tsai, JC
    Lin, JW
    [J]. 11TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2005, : 151 - 158
  • [10] Communication-induced multimedia checkpoint protocol
    Ono, M
    Hirakawa, T
    Higaki, H
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1819 - 1822