TASK ALLOCATION AND REALLOCATION FOR FAULT-TOLERANCE IN MULTICOMPUTER SYSTEMS

被引:6
|
作者
CHEN, CIH [1 ]
CHERKASSKY, V [1 ]
机构
[1] UNIV MINNESOTA,DEPT ELECT ENGN,MINNEAPOLIS,MN 55455
关键词
D O I
10.1109/7.328753
中图分类号
V [航空、航天];
学科分类号
08 ; 0825 ;
摘要
The goal of task allocation in a set of interconnected processors (computers) is to maximize the efficient use of resources and thus reduce the job turnaround time. Proposed here a simple yet effective method to allocate the tasks in multicomputer systems for minimizing the interprocessor communication cost subject to resource limitations defined by the system and designer. The limitations can be viewed as results from the load balancing since the execution time of each task, the number of available processors, processor speed, and memory capacity are known to the system or designer. As the number of processors increases, the probability of a failure existing somewhere in the systems at any time also increases. Very few established task allocation models have considered the reliability property. In multicomputer systems, we define system reliability as the probability that the system can run the tasks successfully. After the (nonredundant) task scheduling strategy is defined, tasks are then reallocated to processors statically and redundantly. This is a form of time redundancy, in which if some processors fail during the execution, all tasks can be completed on the remaining processors (but at a longer time). Due to static preallocation of tasks this method is simpler and thus more practical than well-known dynamic reconfiguration and rollback recovery techniques in multicomputer systems. We demonstrate the effectiveness of the task allocation and reallocation for hardware fault tolerance by illustrations of applying the methods to different examples and practical communications network multiprocessor systems.
引用
收藏
页码:1094 / 1104
页数:11
相关论文
共 50 条
  • [1] ORGANIZATION OF TASK ALLOCATION IN COMPUTING SYSTEMS THAT ENSURES THEIR FAULT-TOLERANCE
    TURUTA, EN
    [J]. AVTOMATIKA I VYCHISLITELNAYA TEKHNIKA, 1985, (01): : 5 - 14
  • [2] Mechanisms of operating systems supporting fault-tolerance of multicomputer control systems
    Mamedli, EM
    Sobolev, NA
    [J]. AUTOMATION AND REMOTE CONTROL, 1995, 56 (08) : 1065 - 1105
  • [3] Practical task allocation for software fault-tolerance and its implementation in embedded automotive systems
    Bhat, Anand
    Samii, Soheil
    Rajkumar, Ragunathan
    [J]. REAL-TIME SYSTEMS, 2019, 55 (04) : 889 - 924
  • [4] Practical task allocation for software fault-tolerance and its implementation in embedded automotive systems
    Anand Bhat
    Soheil Samii
    Ragunathan Rajkumar
    [J]. Real-Time Systems, 2019, 55 : 889 - 924
  • [5] Practical Task Allocation for Software Fault-Tolerance and Its Implementation in Embedded Automotive Systems
    Bhat, Anand
    Samii, Soheil
    Rajkumar, Ragunathan
    [J]. PROCEEDINGS OF THE 23RD IEEE REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2017), 2017, : 87 - 97
  • [6] REDUNDANT TASK-ALLOCATION IN MULTICOMPUTER SYSTEMS
    CHERKASSKY, V
    CHEN, CIH
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 1992, 41 (03) : 336 - 342
  • [7] Fault-tolerance in biochemical systems
    Winfree, Erik
    [J]. UNCONVENTIONAL COMPUTATION, PROCEEDINGS, 2006, 4135 : 26 - 26
  • [8] Task scheduling with fault-tolerance in real-time heterogeneous systems
    Liu, Jing
    Wei, Mengxue
    Hu, Wei
    Xu, Xin
    Ouyang, Aijia
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2018, 90 : 23 - 33
  • [9] Deploying fault-tolerance and task migration with NetSolve
    Plank, JS
    Casanova, H
    Beck, M
    Dongarra, J
    [J]. APPLIED PARALLEL COMPUTING: LARGE SCALE SCIENTIFIC AND INDUSTRIAL PROBLEMS, 1998, 1541 : 418 - 432
  • [10] OPERATING-SYSTEMS AND FAULT-TOLERANCE
    SCHLICHTING, RD
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1991, 563 : 150 - 153