Replicated process allocation for load distribution in fault-tolerant multicomputers

被引:6
|
作者
Kim, J [1 ]
Lee, H [1 ]
Lee, S [1 ]
机构
[1] POHANG UNIV SCI & TECHNOL, DEPT ELECT ENGN, POHANG 790784, SOUTH KOREA
关键词
backup process; checkpointing; fault-tolerant multicomputer; load balancing; process allocation;
D O I
10.1109/12.588067
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider a load-balancing process allocation method for fault-tolerant multicomputer systems that balances the load before as well as after faults start to degrade the performance of the system. In order to be able to tolerate a single fault, each process (primary process) is duplicated (i.e., has a backup process). The backup process executes on a different processor from the primary, checkpointing the primary process and recovering the process if the primary process fails. In this paper, we formalize the problem of load-balancing process allocation and propose a new process allocation method and analyze the performance of the proposed method. Simulations are used to compare the proposed method with a process allocation method that does not take into account the different load characteristics of the primary and backup processes. While both methods perform well before the occurrence of a fault, only the proposed method maintains a balanced load after the occurrence of such a fault.
引用
收藏
页码:499 / 505
页数:7
相关论文
共 50 条
  • [21] AN EFFICIENT, FAULT-TOLERANT PROTOCOL FOR REPLICATED DATA MANAGEMENT
    SKEEN, D
    ELABBADI, A
    CRISTIAN, F
    LECTURE NOTES IN COMPUTER SCIENCE, 1990, 448 : 171 - 191
  • [22] The Fault-Tolerant Facility Allocation Problem
    Xu, Shihong
    Shen, Hong
    ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2009, 5878 : 689 - 698
  • [23] Fault-tolerant multicasting in hypercube multicomputers based on local safety information
    Xiang, D
    Wu, J
    PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, 2000, : 529 - 534
  • [24] Fault-tolerant routing in hypercube multicomputers using local safety information
    Xiang, D
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2001, 12 (09) : 942 - 951
  • [25] Fault-tolerant routing strategy using routing capability in hypercube multicomputers
    Chiu, GM
    Chen, KS
    1996 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 1996, : 396 - 403
  • [26] Fault-tolerant process control
    El-Farra, Nael H.
    Armaou, Antonios
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2012, 22 (01) : 1 - 2
  • [27] Data distribution algorithms for load balanced fault-tolerant web access
    Narendran, B
    Rangarajan, S
    Yajnik, S
    SIXTEENTH SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 1997, : 97 - 106
  • [28] A fault-tolerant mobile agent model in replicated secure services
    Park, K
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2004, PT 1, 2004, 3043 : 500 - 509
  • [29] A fault-tolerant multicast routing algorithm based on cube algebra for hypercube multicomputers
    Günes, S
    Yilmaz, N
    Öztürk, A
    MELECON 2000: INFORMATION TECHNOLOGY AND ELECTROTECHNOLOGY FOR THE MEDITERRANEAN COUNTRIES, VOLS 1-3, PROCEEDINGS, 2000, : 107 - 110
  • [30] A Fault-Tolerant Algorithm For Distributed Resource Allocation
    Pessolani, P.
    Jara, O.
    Gonnet, S.
    Cortes, T.
    Tinetti, F. G.
    IEEE LATIN AMERICA TRANSACTIONS, 2017, 15 (11) : 2152 - 2163