A communication-induced checkpointing and asynchronous recovery algorithm for multithreaded distributed systems

被引：0

作者：

Tantikul, T ^{[1
]}

Manivannan, D ^{[1
]}

机构：

[1] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA

来源：

PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS | 2004年 / 3320卷

关键词：

distributed checkpointing; communication-induced checkpointing; fault-tolerance; multithreaded distributed system; asynchronous recovery;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Checkpointing and recovery in traditional distributed systems is relatively well established. However, checkpointing and recovery in multithreaded distributed systems has not been studied in the literature. Using the traditional checkpointing and recovery algorithms in multithreaded systems leads to false causality problem and high checkpointing overhead. The checkpointing algorithm is implemented at the process level to reduce number of checkpoints and the recovery algorithm is implemented at the thread level which minimizes the false causality problem. The algorithm also takes advantage of the communication-induced checkpointing method to reduce the message overhead.

引用

页码：284 / 292

页数：9

共 50 条

[1] A Scalable Communication-Induced Checkpointing Algorithm for Distributed Systems
Simon, Alberto Calixto
Hernandez, Saul E. Pomares
Cruz, Jose Roberto Perez
Gomez-Gil, Pilar
Drira, Khalil
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (04) : 886 - 896
[2] A communication-induced checkpointing and asynchronous recovery protocol for mobile computing systems
Tantikul, T
Manivannan, D
PDCAT 2005: SIXTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2005, : 70 - 74
[3] A communication-induced checkpointing algorithm using virtual checkpoint on distributed systems
Do-Hyung, K
Chang-Soon, P
SEVENTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 2000, : 145 - 150
[4] FINE: A Fully Informed aNd Efficient communication-induced checkpointing protocol for distributed systems
Luo, Yi
Manivannan, D.
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2009, 69 (02) : 153 - 167
[5] Communication-Induced Checkpointing with Message Logging beyond the Piecewise Deterministic (PWD) Model for Distributed Systems
Ahn, Jinho
ELECTRONICS, 2021, 10 (12)
[6] On properties of RDT communication-induced checkpointing protocols
Tsai, JC
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2003, 14 (08) : 755 - 764
[7] A Scalable Communication-Induced Checkpointing Algorithm for Distributed Systems (vol E96D, pg 886, 2013)
Hernandez, Saul E. Pomares
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (05) : 1256 - 1256
[8] Performance of communication-induced checkpointing algorithms
Manivannan, D
Zhang, C
COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2003, 18 (03): : 129 - 136
[9] Scalable Communication-Induced Checkpointing Protocol with Little Overhead for Distributed Computing Environments
Ahn, Jinho
ELECTRONICS, 2023, 12 (12)
[10] An efficient communication induced rollforward checkpointing and recovery protocol for distributed systems
Gu, MM
Zeng, L
Liang, ZH
Gupta, B
COMPUTERS AND THEIR APPLICATIONS, 2000, : 298 - 302

← 1 2 3 4 5 →