A decentralized and fault tolerant convergence detection algorithm for asynchronous iterative algorithms

被引:1
|
作者
Charr, Jean-Claude [1 ]
Couturier, Raphael [1 ]
Laiymani, David [1 ]
机构
[1] Univ Franche Comte, Lab Comp Sci Franche Comte, IUT Belfort Montbeliard, F-90016 Belfort, France
来源
JOURNAL OF SUPERCOMPUTING | 2010年 / 53卷 / 02期
关键词
Decentralized global convergence detection mechanism; Peer-to-Peer environment; Distributed clusters; Fault tolerance;
D O I
10.1007/s11227-009-0293-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This article presents an algorithm that performs a decentralized detection of the global convergence of parallel asynchronous iterative applications. This algorithm is fault tolerant. It runs a decentralized saving procedure which enables this algorithm, after a node's crash, to replace the dead node by a new one which will continue the computing task from the last check point. Combined with the advantages of the asynchronous iteration model, this method allows us to compute very large scale problems using highly volatile parallel architectures like Peer-to-Peer and distributed clusters architectures. We also present the implementation of this algorithm in the JaceP2P platform which is dedicated to designing and executing parallel asynchronous iterative applications in volatile environments. Numerous experiments show the robustness and the efficiency of our algorithm.
引用
收藏
页码:269 / 292
页数:24
相关论文
共 50 条
  • [1] A decentralized and fault tolerant convergence detection algorithm for asynchronous iterative algorithms
    Laboratory of Computer Science of Franche Comte, University of Franche-Comte, IUT de Belfort-Montbéliard, Rue Engel Gros BP 527, 90016 Belfort, France
    J Supercomput, 2 (269-292):
  • [2] A decentralized and fault tolerant convergence detection algorithm for asynchronous iterative algorithms
    Jean-Claude Charr
    Raphaël Couturier
    David Laiymani
    The Journal of Supercomputing, 2010, 53 : 269 - 292
  • [3] A decentralized convergence detection algorithm for asynchronous parallel iterative algorithms
    Bahi, JM
    Contassot-Vivier, S
    Couturier, R
    Vernier, F
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2005, 16 (01) : 4 - 13
  • [4] An Efficient and Robust Decentralized Algorithm for Detecting the Global Convergence in Asynchronous Iterative Algorithms
    Bahi, Jacques M.
    Contassot-Vivier, Sylvain
    Couturier, Raphael
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2008, 2008, 5336 : 240 - +
  • [5] Rapid convergence in fault tolerant adaptive algorithms
    Soni, RA
    Gallivan, KA
    Jenkins, WK
    ISCAS '99: PROCEEDINGS OF THE 1999 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 3: ANALOG AND DIGITAL SIGNAL PROCESSING, 1999, : 150 - 153
  • [6] Rapid convergence in fault tolerant adaptive algorithms
    Soni, Robert A.
    Gallivan, Kyle A.
    Jenkins, W.Kenneth
    Proceedings - IEEE International Symposium on Circuits and Systems, 1999, 3
  • [7] ASYNCHRONOUS FAULT-TOLERANT TOTAL ORDERING ALGORITHMS
    MOSER, LE
    MELLIARSMITH, PM
    AGRAWALA, V
    SIAM JOURNAL ON COMPUTING, 1993, 22 (04) : 727 - 750
  • [8] A Fault-Tolerant Distributed Framework for Asynchronous Iterative Computations
    Zhou, Tian
    Gao, Lixin
    Guan, Xiaohong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (08) : 2062 - 2073
  • [9] A Fault-Tolerant Framework for Asynchronous Iterative Computations in Cloud Environments
    Wang, Zhigang
    Gao, Lixin
    Gu, Yu
    Bao, Yubin
    Yu, Ge
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (08) : 1678 - 1692
  • [10] A Fault-Tolerant Framework for Asynchronous Iterative Computations in Cloud Environments
    Wang, Zhigang
    Gao, Lixin
    Gu, Yu
    Bao, Yubin
    Yu, Ge
    PROCEEDINGS OF THE SEVENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC 2016), 2016, : 71 - 83