Optimal diagnosis of heterogeneous systems with random faults

被引:11
|
作者
Pelc, A [1 ]
机构
[1] Univ Quebec, Dept Informat, Hull, PQ J8X 3X7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
fault diagnosis; fault tolerance; random fault; test;
D O I
10.1109/12.660165
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the problem of fault diagnosis in multiprocessor systems. Processors perform tests on one another; fault-free testers correctly identify the fault status of tested processors, while faulty testers can give arbitrary test results. Processors fail with arbitrary probabilities and all failures are independent. The goal is to identify correctly the status of all processors, based on the set of test results. A diagnosis algorithm is optimal if it has the highest probability of correctness (reliability) among all (deterministic) diagnosis algorithms. We give a fast diagnosis algorithm and prove its optimality for arbitrary values of failure probabilities. This is the first time that optimal diagnosis is given for systems without any assumptions on the behavior of faulty processors or on the values of failure probabilities. We also investigate locally optimal diagnosis algorithms: For any set of test results, they return the most probable configuration of faulty and fault-free processors that could yield it. We show a fast diagnosis which is always locally optimal. If all processors have failure probabilities smaller than 1/2, a locally optimal diagnosis is proved to be optimal. However, if some processors have failure probabilities exceeding 1/2, a locally optimal diagnosis need not have the highest reliability. We even show examples that it may have arbitrarily small reliability when the number of processors increases, while optimal reliability remains constant.
引用
收藏
页码:298 / 304
页数:7
相关论文
共 50 条
  • [31] Active Diagnosis of Incipient Actuator Faults for Stochastic Systems
    Guo, Yaqi
    He, Xiao
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024, 71 (01) : 996 - 1005
  • [32] Review on Diagnosis Techniques for Intermittent Faults in Dynamic Systems
    Zhou, Donghua
    Zhao, Yinghong
    Wang, Zidong
    He, Xiao
    Gao, Ming
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2020, 67 (03) : 2337 - 2347
  • [33] OPTIMAL ALGORITHM FOR TESTING STUCK-AT FAULTS IN RANDOM-ACCESS MEMORIES
    KNAIZUK, J
    HARTMANN, CRP
    IEEE TRANSACTIONS ON COMPUTERS, 1977, 26 (11) : 1141 - 1144
  • [34] Optimal task scheduling for partially heterogeneous systems
    Orr, Michael
    Sinnen, Oliver
    PARALLEL COMPUTING, 2021, 107
  • [35] Optimal task assignment in heterogeneous computing systems
    Kafil, M
    Ahmad, I
    SIXTH HETEROGENEOUS COMPUTING WORKSHOP (HCW '97), PROCEEDINGS, 1997, : 135 - 146
  • [36] Optimal Load Balancing in Heterogeneous Server Systems
    Bhambay, Sanidhay
    Mukhopadhyay, Arpan
    2022 20TH INTERNATIONAL SYMPOSIUM ON MODELING AND OPTIMIZATION IN MOBILE, AD HOC, AND WIRELESS NETWORKS (WIOPT 2022), 2022, : 113 - 120
  • [37] Optimal image partitioning in heterogeneous computing systems
    Zeng, ZY
    Lu, XD
    ELECTRONICS LETTERS, 2002, 38 (18) : 1023 - 1023
  • [38] Optimal control of queueing systems with heterogeneous servers
    Rykov, V
    Efrosinin, D
    QUEUEING SYSTEMS, 2004, 46 (3-4) : 389 - 407
  • [39] Optimal Control of Queueing Systems with Heterogeneous Servers
    V. Rykov
    D. Efrosinin
    Queueing Systems, 2004, 46 : 389 - 407
  • [40] Optimal control of heterogeneous systems: Basic theory
    Veliov, Vladimir M.
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2008, 346 (01) : 227 - 242