A decentralized and fault tolerant convergence detection algorithm for asynchronous iterative algorithms

被引:1
|
作者
Charr, Jean-Claude [1 ]
Couturier, Raphael [1 ]
Laiymani, David [1 ]
机构
[1] Univ Franche Comte, Lab Comp Sci Franche Comte, IUT Belfort Montbeliard, F-90016 Belfort, France
来源
JOURNAL OF SUPERCOMPUTING | 2010年 / 53卷 / 02期
关键词
Decentralized global convergence detection mechanism; Peer-to-Peer environment; Distributed clusters; Fault tolerance;
D O I
10.1007/s11227-009-0293-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This article presents an algorithm that performs a decentralized detection of the global convergence of parallel asynchronous iterative applications. This algorithm is fault tolerant. It runs a decentralized saving procedure which enables this algorithm, after a node's crash, to replace the dead node by a new one which will continue the computing task from the last check point. Combined with the advantages of the asynchronous iteration model, this method allows us to compute very large scale problems using highly volatile parallel architectures like Peer-to-Peer and distributed clusters architectures. We also present the implementation of this algorithm in the JaceP2P platform which is dedicated to designing and executing parallel asynchronous iterative applications in volatile environments. Numerous experiments show the robustness and the efficiency of our algorithm.
引用
收藏
页码:269 / 292
页数:24
相关论文
共 50 条
  • [21] A fault-tolerant algorithm for decentralized on-line quorum adaptation
    Bearden, M
    Bianchini, RP
    TWENTY-EIGHTH ANNUAL INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT COMPUTING, DIGEST PAPERS, 1998, : 262 - 271
  • [22] A distributed fault-tolerant asynchronous algorithm for performing N tasks
    Weerasinghe, GM
    Lipsky, L
    COMPUTERS AND THEIR APPLICATIONS, 2001, : 69 - 73
  • [23] Decentralized, Asynchronous Algorithms and Social Systems
    Cybenko, George
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 1795 - 1795
  • [24] JACEP2P-V2: A Fully Decentralized and Fault Tolerant Environment for Executing Parallel Iterative Asynchronous Applications on Volatile Distributed Architectures
    Charr, Jean-Claude
    Couturier, Raphael
    Laiymani, David
    ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2009, 5529 : 446 - 458
  • [25] JACEP2P-V2: A fully decentralized and fault tolerant environment for executing parallel iterative asynchronous applications on volatile distributed architectures
    Charr, Jean-Claude
    Couturier, Raphael
    Laiymani, David
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2011, 27 (05): : 606 - 613
  • [26] JACEP2P-V2: A fully decentralized and fault tolerant environment for executing parallel iterative asynchronous applications on volatile distributed architectures
    Laboratory of Computer Sciences, University of Franche-Comté , IUT de Belfort-Montbéliard, Rue Engel Gros, BP 527, 90016 Belfort, France
    Future Gener Comput Syst, 5 (606-613):
  • [27] FAULT-TOLERANT ASYNCHRONOUS NETWORKS
    PRADHAN, DK
    REDDY, SM
    IEEE TRANSACTIONS ON COMPUTERS, 1973, C 22 (07) : 662 - 669
  • [28] Fault Tolerant Decentralized Scheduling Algorithm for P2P Grid
    Chauhan, Piyush
    Nitin
    2ND INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING & SECURITY [ICCCS-2012], 2012, 1 : 698 - 707
  • [29] A Fully Asynchronous and Fault Tolerant Distributed Algorithm to Compute a Minimum Graph Orientation
    Gillet, Noel
    Hanusse, Nicolas
    STABILIZATION, SAFETY, AND SECURITY OF DISTRIBUTED SYSTEMS, SSS 2017, 2018, 10616 : 308 - 322
  • [30] Convergence Rate Analysis of a Fault-Tolerant Distributed Consensus Algorithm
    Haseltalab, Ali
    Akar, Mehmet
    2015 54TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2015, : 5111 - 5116