A hierarchical adaptive distributed system-level diagnosis algorithm

被引:64
|
作者
Duarte, EP
Nanya, T
机构
[1] Univ Fed Parana, Dept Informat, BR-81531990 Curitiba, Parana, Brazil
[2] Univ Tokyo, Adv Sci & Technol Res Ctr, Meguro Ku, Tokyo 153, Japan
[3] Tokyo Inst Technol, Dept Comp Sci, Meguro Ku, Tokyo 152, Japan
关键词
system-level diagnosis; adaptive diagnosis; distributed diagnosis; network management; fault management; SNMP;
D O I
10.1109/12.656078
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Consider a system composed of N nodes that can be faulty or fault-free. The purpose of distributed system-level diagnosis is to have each fault-free node determine the state of all nodes of the system. This paper presents a Hierarchical Adaptive Distributed System-level Diagnosis (Hi-ADSD) algorithm, which is a fully distributed algorithm that allows every fault-free node to achieve diagnosis in, at most, (log(2) N)(2) testing rounds. Nodes are mapped into progressively larger logical clusters, so that tests are run in a hierarchical fashion. Each node executes its tests independently of the other nodes, i.e., tests are run asynchronously. All the information that nodes exchange is diagnostic information. The algorithm assumes no link faults, a fully-connected network and imposes no bounds on the number of faults. Both the worst-case diagnosis latency and correctness of the algorithm are formally proved. As an example application, the algorithm was implemented on a 37-node Ethernet LAN, integrated to a network management system based on SNMP (Simple Network Management Protocol). Experimental results of fault and repair diagnosis are presented. This implementation by itself is also a significant contribution, for, although fault management is a key functional area of network management systems, currently deployed applications often implement only rudimentary diagnosis mechanisms. Furthermore, experimental results are given through simulation of the algorithm for large systems of 64 nodes and 512 nodes.
引用
收藏
页码:34 / 45
页数:12
相关论文
共 50 条
  • [31] Efficient and fault-tolerant distributed host monitoring using system-level diagnosis
    Bearden, M
    Bianchini, R
    DISTRIBUTED PLATFORMS, 1996, : 159 - 172
  • [32] A hierarchical adaptive distributed algorithm for load balancing
    Antonis, K
    Garofalakis, J
    Mourtos, I
    Spirakis, P
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2004, 64 (01) : 151 - 162
  • [33] Endogenous Security of FQn Networks: Adaptive System-Level Fault Self-Diagnosis
    Lin, Yuhang
    Lin, Limei
    Huang, Yanze
    Xu, Li
    Hsieh, Sun-Yuan
    IEEE TRANSACTIONS ON RELIABILITY, 2024, 73 (03) : 1659 - 1668
  • [34] Cluster-based system-level fault diagnosis in hierarchical ad-hoc networks
    Li, Dongni
    CIS: 2007 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PROCEEDINGS, 2007, : 1062 - 1066
  • [35] An algorithm for synthesis of system-level interface circuits
    Chung, KS
    Gupta, RK
    Liu, CL
    1996 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN - DIGEST OF TECHNICAL PAPERS, 1996, : 442 - 447
  • [36] A comparison of evolutionary algorithms for system-level diagnosis
    Nassu, Bogdan Tomoyuki
    Duarte, Elias Procopio, Jr.
    Ramirez Pozo, Aurora T.
    GECCO 2005: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOLS 1 AND 2, 2005, : 2053 - 2060
  • [37] Node grouping in system-level fault diagnosis
    Dafang Zhang
    Gaogang Xie
    Yinghua Min
    Journal of Computer Science and Technology, 2001, 16 : 474 - 479
  • [38] Node Grouping in System-Level Fault Diagnosis
    张大方
    谢高岗
    闵应骅
    JournalofComputerScienceandTechnology, 2001, (05) : 474 - 479
  • [39] SYSTEM-LEVEL FAULT-DIAGNOSIS - A SURVEY
    KREUTZER, SE
    HAKIMI, SL
    MICROPROCESSING AND MICROPROGRAMMING, 1987, 20 (4-5): : 323 - 330
  • [40] A NEW MEASURE IN SYSTEM-LEVEL DIAGNOSIS OF HYPERCUBES
    KAVIANPOUR, A
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1993, 19 (04) : 372 - 378