A fault-tolerant hierarchical diagnostic network for massively parallel processing systems

被引:0
|
作者
Choi, YH [1 ]
Kim, YS
机构
[1] Hongik Univ, Dept Comp Engn, Seoul, South Korea
[2] Hanjin Informat Syst & Telecommun Co, Ctr Res & Dev, Seoul, South Korea
关键词
massively parallel processors; diagnostic network; VLSI; fault tolerance;
D O I
10.1016/S0045-7906(98)00007-X
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Massively parallel processing systems consist of a large number of processing nodes to provide high performance primarily for data-intensive applications. In a system of such dimensions, high availability cannot be achieved without relying on redundancy and reconfiguration. An important aspect of highly available design is rapid diagnosis and graceful degradation in the event of failures. This paper presents a hierarchical diagnostic network for locating faults in parallel processor systems comprised of a large number of identical processing nodes. In the case of a single fault, the network can locate the fault at the time it is detected. Even in the case of multiple faults, it can significantly reduce the test time as compared to the well-known binary search. Unlike the existing self-diagnostic circuits, the diagnostic network requires small hardware overhead and may tolerate a fault in the network itself. (C) 1998 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:349 / 361
页数:13
相关论文
共 50 条
  • [31] Fault-Tolerant Parallel Integer Multiplication
    Nissim, Roy
    Schwartz, Oded
    Spiizer, Yuval
    [J]. PROCEEDINGS OF THE 36TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, SPAA 2024, 2024, : 207 - 218
  • [32] Highly fault-tolerant parallel computation
    Spielman, DA
    [J]. 37TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 1996, : 154 - 163
  • [33] FAULT-TOLERANT SCHEMES FOR PARALLEL ARCHITECTURES
    LIVESEY, MJ
    OWCZARCZYK, J
    [J]. ELECTRONICS LETTERS, 1987, 23 (22) : 1206 - 1207
  • [34] FAULT-TOLERANT PARALLEL PROGRAMMING IN ARGUS
    BAL, HE
    [J]. CONCURRENCY-PRACTICE AND EXPERIENCE, 1992, 4 (01): : 37 - 55
  • [35] A novel fault-tolerant parallel algorithm
    Wang, Panfeng
    Du, Yunfei
    Fu, Hongyi
    Zhou, Haifang
    Yang, Xuejun
    Yang, Wenjing
    [J]. ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2007, 4847 : 18 - 29
  • [36] Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems
    Brito, Andrey
    Fetzer, Christof
    Felber, Pascal
    [J]. 2009 29TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 2009, : 173 - +
  • [37] THE CUBICAL RING CONNECTED CYCLES - A FAULT-TOLERANT PARALLEL COMPUTATION NETWORK
    BANERJEE, P
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1988, 37 (05) : 632 - 636
  • [38] Graph-Logic Models of Hierarchical Fault-Tolerant Multiprocessor Systems
    Romankevich, Alexei M.
    Morozov, Kostiantyn, V
    Romankevich, Vitaliy A.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2019, 19 (07): : 151 - 156
  • [39] Hierarchical-Structure-Based Fault Estimation and Fault-Tolerant Control for Multiagent Systems
    Liu, Chun
    Jiang, Bin
    Patton, Ron J.
    Zhang, Ke
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2019, 6 (02): : 586 - 597
  • [40] Massively parallel fault tolerant computations on syntactical patterns
    Kutrib, M
    Löwe, JT
    [J]. FUTURE GENERATION COMPUTER SYSTEMS, 2002, 18 (07) : 905 - 919