A distributed fault-tolerant asynchronous algorithm for performing N tasks

被引:0
|
作者
Weerasinghe, GM [1 ]
Lipsky, L [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
关键词
Networks of Workstations; message passing; performance evaluation; fault-tolerance; asynchronous; communication; dynamic load balancing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper is a performance study of a fault-tolerant asynchronous algorithm for performing N independent and idempotent tasks on P processes. It is designed for the programming model Single Program Multiple Data (SPMD) and the failure model Fail-Stop failures without restarts. Our algorithm tolerates up to P - 1 process failures. That is, at least one process must survive for the lifetime of the application. The algorithm is structured in terms of a Symmetric Task Model in which each process is responsible for scheduling tasks dynamically, and distributing progress information. A parameter called Periodicity controls how often progress information is distributed to the rest of the processes. A process can fail while distributing its progress information, causing inconsistencies between task partitions of different processes. Therefore, the major design goals are: to optimize the scheduling phase such that in the presence of failures and communication time-outs, the number of tasks redone is minimized; to minimize the allocation of resources. In our study we avoid the use of checkpointing. Lost tasks are simply redone. Processes communicate only through asynchronous message passing. We present preliminary results of performance tests of this algorithm that we have implemented.
引用
收藏
页码:69 / 73
页数:5
相关论文
共 50 条
  • [1] A Novel Fault-Tolerant Scheduling Algorithm for Periodic Tasks of Distributed Control Systems
    Liu Huai
    Lin Qiushi
    Huang Hanxin
    Ji Tongzhou
    CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 1584 - +
  • [2] A Fault-Tolerant Distributed Framework for Asynchronous Iterative Computations
    Zhou, Tian
    Gao, Lixin
    Guan, Xiaohong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (08) : 2062 - 2073
  • [3] Fundamentals of fault-tolerant distributed computing in asynchronous environments
    Gärtner, FC
    ACM COMPUTING SURVEYS, 1999, 31 (01) : 1 - 26
  • [4] Optimal and fault-tolerant scheduling algorithm for multi-tasks in distributed control systems
    Liu, Huai
    Huang, Jianxin
    Shen, Jie
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 6675 - +
  • [5] Design of fault-tolerant scheduling algorithm for real-time tasks in distributed systems
    Han, Zongfen
    Qin, Xiao
    Pang, Liping
    Li, Shengli
    Huazhong Ligong Daxue Xuebao/Journal Huazhong (Central China) University of Science and Technology, 27 (06): : 12 - 14
  • [6] A fault-tolerant distributed deadlock detection algorithm
    Hansdah, RC
    Gantait, N
    Dey, S
    DISTRIBUTED COMPUTING, PROCEEDINGS: MOBILE AND WIRELESS COMPUTING, 2002, 2571 : 78 - 87
  • [7] An efficient fault-tolerant scheduling algorithm for precedence constrained tasks in heterogeneous distributed systems
    Nakechbandi, M.
    Colin, J. -Y.
    Gashumba, J. B.
    INNOVATIONS AND ADVANCED TECHNIQUES IN COMPUTER AND INFORMATION SCIENCES AND ENGINEERING, 2007, : 301 - 307
  • [8] A Fault-Tolerant Algorithm For Distributed Resource Allocation
    Pessolani, P.
    Jara, O.
    Gonnet, S.
    Cortes, T.
    Tinetti, F. G.
    IEEE LATIN AMERICA TRANSACTIONS, 2017, 15 (11) : 2152 - 2163
  • [9] FAULT-TOLERANT ASYNCHRONOUS NETWORKS
    PRADHAN, DK
    REDDY, SM
    IEEE TRANSACTIONS ON COMPUTERS, 1973, C 22 (07) : 662 - 669
  • [10] Real-time fault-tolerant scheduling algorithm of periodic tasks in heterogeneous distributed systems
    School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
    Jisuanji Xuebao, 2007, 10 (1740-1749):