A fault-tolerant hierarchical diagnostic network for massively parallel processing systems

被引:0
|
作者
Choi, YH [1 ]
Kim, YS
机构
[1] Hongik Univ, Dept Comp Engn, Seoul, South Korea
[2] Hanjin Informat Syst & Telecommun Co, Ctr Res & Dev, Seoul, South Korea
关键词
massively parallel processors; diagnostic network; VLSI; fault tolerance;
D O I
10.1016/S0045-7906(98)00007-X
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Massively parallel processing systems consist of a large number of processing nodes to provide high performance primarily for data-intensive applications. In a system of such dimensions, high availability cannot be achieved without relying on redundancy and reconfiguration. An important aspect of highly available design is rapid diagnosis and graceful degradation in the event of failures. This paper presents a hierarchical diagnostic network for locating faults in parallel processor systems comprised of a large number of identical processing nodes. In the case of a single fault, the network can locate the fault at the time it is detected. Even in the case of multiple faults, it can significantly reduce the test time as compared to the well-known binary search. Unlike the existing self-diagnostic circuits, the diagnostic network requires small hardware overhead and may tolerate a fault in the network itself. (C) 1998 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:349 / 361
页数:13
相关论文
共 50 条
  • [21] FAULT-TOLERANT SYSTEMS
    AVIZIENIS, A
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1976, 25 (12) : 1304 - 1312
  • [22] FAULT-TOLERANT SYSTEMS
    SINGH, AD
    MURUGESAN, S
    [J]. COMPUTER, 1990, 23 (07) : 15 - 17
  • [23] Fault-tolerant parallel algorithms for adaptive matched-field processing on distributed array systems
    Cho, K
    George, AD
    Subramaniyan, R
    [J]. JOURNAL OF COMPUTATIONAL ACOUSTICS, 2005, 13 (04) : 667 - 687
  • [24] PERFORMANCE ANALYSIS OF FAULT-TOLERANT SYSTEMS IN PARALLEL EXECUTION OF CONVERSATIONS
    KIM, KH
    HEU, S
    YANG, SM
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 1989, 38 (01) : 96 - 102
  • [25] An efficient fault-tolerant approach for MPLS network systems
    Lin, JW
    Liu, HY
    [J]. PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2004, 3358 : 815 - 824
  • [26] PAV: Parallel Average Voting Algorithm for Fault-Tolerant Systems
    Karimi, Abbas
    Zarafshan, Faraneh
    Jantan, Adznan B.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2011, 2 (01) : 38 - 41
  • [27] An Optimal Parallel Average Voting For Fault-Tolerant Control Systems
    Karimi, Abbas
    Faraneh, Zarafshan
    Jantan, Adznan B.
    Ramli, Abdul Rahman B.
    Saripan, M. Iqbal B.
    [J]. 2010 INTERNATIONAL CONFERENCE ON NETWORKING AND INFORMATION TECHNOLOGY (ICNIT 2010), 2010, : 360 - 363
  • [28] An approach to fault-tolerant parallel processing on intermittently idle, heterogeneous workstations
    Jeong, K
    Shasha, D
    Talla, S
    Wyckoff, P
    [J]. TWENTY-SEVENTH ANNUAL INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT COMPUTING, DIGEST OF PAPERS, 1997, : 11 - 20
  • [29] Fault diagnosis and fault-tolerant control of uncertain network control systems
    Gu, Zhaoyu
    Yao, Lina
    [J]. INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2021, 35 (09) : 1941 - 1956
  • [30] Classification and design of fault-tolerant parallel
    Du, Yunfei
    Tang, Yuhua
    [J]. Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2011, 39 (04): : 49 - 52