A fault-tolerant hierarchical diagnostic network for massively parallel processing systems

被引:0
|
作者
Choi, YH [1 ]
Kim, YS
机构
[1] Hongik Univ, Dept Comp Engn, Seoul, South Korea
[2] Hanjin Informat Syst & Telecommun Co, Ctr Res & Dev, Seoul, South Korea
关键词
massively parallel processors; diagnostic network; VLSI; fault tolerance;
D O I
10.1016/S0045-7906(98)00007-X
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Massively parallel processing systems consist of a large number of processing nodes to provide high performance primarily for data-intensive applications. In a system of such dimensions, high availability cannot be achieved without relying on redundancy and reconfiguration. An important aspect of highly available design is rapid diagnosis and graceful degradation in the event of failures. This paper presents a hierarchical diagnostic network for locating faults in parallel processor systems comprised of a large number of identical processing nodes. In the case of a single fault, the network can locate the fault at the time it is detected. Even in the case of multiple faults, it can significantly reduce the test time as compared to the well-known binary search. Unlike the existing self-diagnostic circuits, the diagnostic network requires small hardware overhead and may tolerate a fault in the network itself. (C) 1998 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:349 / 361
页数:13
相关论文
共 50 条
  • [1] Hierarchical Hexagon: A New Fault-Tolerant Interconnection Network for Parallel Systems
    Tripathy, Laxminath
    Tripathy, Chita Ranjan
    [J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2021, 21 (01) : 32 - 49
  • [2] Reliability of fault-tolerant systems with parallel task processing
    Levitin, Gregory
    Xie, Min
    Zhang, Tieling
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2007, 177 (01) : 420 - 430
  • [3] On- Demand Fault-Tolerant Loop Processing on Massively Parallel Processor Arrays
    Tanase, Alexandru
    Witterauf, Michael
    Teich, Juergen
    Hannig, Frank
    Lari, Vahid
    [J]. PROCEEDINGS OF THE ASAP2015 2015 IEEE 26TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2015, : 194 - 201
  • [4] A FAULT TOLERANT MASSIVELY PARALLEL PROCESSING ARCHITECTURE
    BALASUBRAMANIAN, V
    BANERJEE, P
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1987, 4 (04) : 363 - 383
  • [5] Towards fault-tolerant massively multiagent systems
    Guessoum, Z
    Briot, JP
    Faci, N
    [J]. MASSIVELY MULTI-AGENT SYSTEMS I, 2005, 3446 : 55 - 69
  • [6] A hierarchical fault-tolerant interconnection network
    AbdElBarr, MH
    Daud, F
    AlTawil, KM
    [J]. CONFERENCE PROCEEDINGS OF THE 1996 IEEE FIFTEENTH ANNUAL INTERNATIONAL PHOENIX CONFERENCE ON COMPUTERS AND COMMUNICATIONS, 1996, : 123 - 128
  • [7] From massively parallel image processors to fault-tolerant nanocomputers
    Han, H
    Jonker, P
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, 2004, : 2 - 7
  • [8] A multilevel fault model for integrated parallel fault-tolerant systems
    Fechner, Bernhard
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2012, 24 (07): : 687 - 698
  • [9] A Massively-Parallel, Fault-Tolerant Solver for High-Dimensional PDEs
    Heene, Mario
    Hinojosa, Alfredo Parra
    Bungartz, Hans-Joachim
    Pflueger, Dirk
    [J]. EURO-PAR 2016: PARALLEL PROCESSING WORKSHOPS, 2017, 10104 : 635 - 647
  • [10] A MASSIVELY-PARALLEL FAULT-TOLERANT ARCHITECTURE FOR TIME-CRITICAL COMPUTING
    AHMAD, I
    [J]. JOURNAL OF SUPERCOMPUTING, 1995, 9 (1-2): : 135 - 162