A communication-induced checkpointing and asynchronous recovery algorithm for multithreaded distributed systems

被引:0
|
作者
Tantikul, T [1 ]
Manivannan, D [1 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
关键词
distributed checkpointing; communication-induced checkpointing; fault-tolerance; multithreaded distributed system; asynchronous recovery;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Checkpointing and recovery in traditional distributed systems is relatively well established. However, checkpointing and recovery in multithreaded distributed systems has not been studied in the literature. Using the traditional checkpointing and recovery algorithms in multithreaded systems leads to false causality problem and high checkpointing overhead. The checkpointing algorithm is implemented at the process level to reduce number of checkpoints and the recovery algorithm is implemented at the thread level which minimizes the false causality problem. The algorithm also takes advantage of the communication-induced checkpointing method to reduce the message overhead.
引用
收藏
页码:284 / 292
页数:9
相关论文
共 50 条
  • [41] Design and analysis of an efficient algorithm for coordinated checkpointing in distributed systems
    Cao, JN
    Jia, WJ
    Jia, XH
    Cheung, TY
    [J]. ADVANCES IN PARALLEL AND DISTRIBUTED COMPUTING - PROCEEDINGS, 1997, : 261 - 268
  • [42] An index-based checkpointing algorithm for autonomous distributed systems
    Baldoni, R
    Quaglia, F
    Fornara, P
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1999, 10 (02) : 181 - 192
  • [43] An index-based checkpointing algorithm for autonomous distributed systems
    Baldoni, R
    Quaglia, F
    Fornara, P
    [J]. SIXTEENTH SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 1997, : 27 - 34
  • [44] An Asynchronous Distributed ADMM Algorithm and Efficient Communication Model
    Fang, Ling
    Lei, Yongmei
    [J]. 2016 IEEE 14TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 14TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 2ND INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/DATACOM/CYBERSC, 2016, : 136 - 140
  • [45] Direct-dependency-based checkpointing and recovery technique for distributed systems
    Shen, L
    Liu, H
    Gu, M
    Gupta, B
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2123 - 2129
  • [46] AN OPTIMALITY PROOF FOR ASYNCHRONOUS RECOVERY ALGORITHMS IN DISTRIBUTED SYSTEMS
    SINGHAL, M
    MATTERN, F
    [J]. INFORMATION PROCESSING LETTERS, 1995, 55 (03) : 117 - 121
  • [47] An efficient non-intrusive checkpointing algorithm for distributed database systems
    Wu, Jiang
    Manivarman, D.
    [J]. DISTRIBUTED COMPUTING AND NETWORKING, PROCEEDINGS, 2006, 4308 : 82 - 87
  • [48] Slotted-FIFO communication for asynchronous distributed systems
    Baldoni, R
    Beraldi, R
    Prakash, R
    [J]. COMPUTER JOURNAL, 1998, 41 (05): : 337 - 348
  • [49] A discrete-event systems approach to communication induced checkpointing
    Ricker, SL
    [J]. WODES'02: SIXTH INTERNATIONAL WORKSHOP ON DISCRETE EVENT SYSTEMS, PROCEEDINGS, 2002, : 69 - 74
  • [50] EVALUATION OF COMMUNICATION INDUCED CHECKPOINTING IN RESOURCE CONSTRAINED EMBEDDED SYSTEMS
    Sababha, Belal H.
    Rawashdeh, Osamah A.
    [J]. PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2011, VOL 3, PTS A AND B, 2012, : 39 - 45