ANALYSIS AND MODELING OF CORRELATED FAILURES IN MULTICOMPUTER SYSTEMS

被引:29
|
作者
TANG, D
IYER, RK
机构
[1] Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana
关键词
CORRELATED FAILURE; DEPENDABILITY EVALUATION; DEPENDABILITY MODELING; FAILURE MEASUREMENT; MARKOV MODEL; MULTICOMPUTER SYSTEM;
D O I
10.1109/12.142683
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Based on the measurements from two DEC VAX-cluster multicomputer systems, this paper addresses the issue of correlated failures. In particular, the characteristics of correlated failures, the impact of correlated failures on dependability, and the modeling of correlated failures are discussed. It is found from the data that most correlated failures are related to errors in shared resources and propagate from one machine to another. Comparisons between measurement-based models and analytical models that assume failure independence show that the impact of correlated failures on dependability is significant. Two validated models, the c-dependent model and the p-dependent model, are developed to evaluate dependability of systems with correlated failures.
引用
收藏
页码:567 / 577
页数:11
相关论文
共 50 条
  • [41] Scheduling methods for multicomputer systems
    Kyiv Internatl. Univ. Civil Aviation, Kyiv, Ukraine
    Journal of Automation and Information Sciences, 2000, 32 (11) : 55 - 57
  • [42] DEPENDABILITY MEASUREMENT AND MODELING OF A MULTICOMPUTER SYSTEM
    TANG, D
    IYER, RK
    IEEE TRANSACTIONS ON COMPUTERS, 1993, 42 (01) : 62 - 75
  • [43] Modeling the Dynamics of Cascading Failures in Power Systems
    Zhang, Xi
    Zhan, Choujun
    Tse, Chi K.
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2017, 7 (02) : 192 - 204
  • [44] MODELING SUDDEN FAILURES IN METALLURGICAL AUTOMATION SYSTEMS
    SPITSYN, VM
    AVRAAMOV, IS
    SEMAKIN, EV
    STEEL IN THE USSR, 1974, 4 (12): : 1009 - 1010
  • [45] Time complexity analysis of neural networks on message passing multicomputer systems
    Tan, RS
    Narasimhan, VL
    ENGINEERING INTELLIGENT SYSTEMS FOR ELECTRICAL ENGINEERING AND COMMUNICATIONS, 1999, 7 (03): : 137 - 144
  • [46] Modeling and analysis of causes and consequences of failures
    Virtanen, Seppo
    Hagmark, Per-Erik
    Penttinen, Jussi-Pekka
    2006 PROCEEDINGS - ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, VOLS 1 AND 2, 2006, : 506 - +
  • [47] MULTICOMPUTER CONFIGURATIONS AND DIAKOPTICS - STABILITY ANALYSIS OF LARGE POWER-SYSTEMS
    HAPP, HH
    IEEE TRANSACTIONS ON POWER APPARATUS AND SYSTEMS, 1974, PA93 (01): : 7 - 7
  • [48] Recursive reliability assessment of radial lifeline systems with correlated component failures
    Rojo, J.
    Duenas-Osorio, L.
    APPLICATIONS OF STATISTICS AND PROBABILITY IN CIVIL ENGINEERING, 2011, : 1435 - 1443
  • [49] ANALYSIS TECHNIQUES FOR QUEUING NETWORK MODELS OF MULTICOMPUTER SYSTEMS WITH SHARED RESOURCES
    THOMASIAN, A
    BAY, P
    COMPUTER PERFORMANCE, 1983, 4 (03): : 151 - 166
  • [50] Time complexity analysis of neural networks on message passing multicomputer systems
    Int J Eng Intell Syst Electic Eng Commun, 3 (137-144):