ANALYSIS AND MODELING OF CORRELATED FAILURES IN MULTICOMPUTER SYSTEMS

被引:29
|
作者
TANG, D
IYER, RK
机构
[1] Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana
关键词
CORRELATED FAILURE; DEPENDABILITY EVALUATION; DEPENDABILITY MODELING; FAILURE MEASUREMENT; MARKOV MODEL; MULTICOMPUTER SYSTEM;
D O I
10.1109/12.142683
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Based on the measurements from two DEC VAX-cluster multicomputer systems, this paper addresses the issue of correlated failures. In particular, the characteristics of correlated failures, the impact of correlated failures on dependability, and the modeling of correlated failures are discussed. It is found from the data that most correlated failures are related to errors in shared resources and propagate from one machine to another. Comparisons between measurement-based models and analytical models that assume failure independence show that the impact of correlated failures on dependability is significant. Two validated models, the c-dependent model and the p-dependent model, are developed to evaluate dependability of systems with correlated failures.
引用
收藏
页码:567 / 577
页数:11
相关论文
共 50 条
  • [31] SPSRG: a prediction approach for correlated failures in distributed computing systems
    Zheng, Weiwei
    Wang, Zhili
    Huang, Haoqiu
    Meng, Luoming
    Qiu, Xuesong
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2016, 19 (04): : 1703 - 1721
  • [32] Reliability of Two Failure Mode Systems Subject to Correlated Failures
    Fiondella, Lance
    Xing, Liudong
    2014 60TH ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM (RAMS), 2014,
  • [33] SPSRG: a prediction approach for correlated failures in distributed computing systems
    Weiwei Zheng
    Zhili Wang
    Haoqiu Huang
    Luoming Meng
    Xuesong Qiu
    Cluster Computing, 2016, 19 : 1703 - 1721
  • [34] Reliability of Heterogeneous Distributed Computing Systems in the Presence of Correlated Failures
    Pezoa, Jorge E.
    Hayat, Majeed M.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (04) : 1034 - 1043
  • [35] ANALYSIS OF DATA TRANSMISSION IN MULTICOMPUTER SYSTEMS WITH DIRECT COUPLING.
    Red'ko, V.A.
    Puzov, V.G.
    Automatic Control and Computer Sciences, 1982, 16 (02) : 6 - 10
  • [36] Efficient Software Reliability Analysis With Correlated Component Failures
    Fiondella, Lance
    Rajasekaran, Sanguthevar
    Gokhale, Swapna S.
    IEEE TRANSACTIONS ON RELIABILITY, 2013, 62 (01) : 244 - 255
  • [37] Time cost analysis for solving difference equations on multicomputer systems
    Diab, H
    ADVANCES IN ENGINEERING SOFTWARE, 1997, 28 (08) : 455 - 462
  • [38] Modeling and Analysis of the Impact of Failures in Electric Power Systems Organized in Interconnected Regions
    Chiaradonna, Silvano
    Di Giandomenico, Felicita
    Nostro, Nicola
    2011 IEEE/IFIP 41ST INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2011, : 442 - 453
  • [39] MODULAR DESIGN OF MULTICOMPUTER SYSTEMS
    UNGER, BW
    BIDULOCK, DS
    SIMULATION, 1981, 37 (01) : 1 - 9
  • [40] Modeling of Correlated Stochastic Processes for the Transient Stability Analysis of Power Systems
    Adeen, Muhammad
    Milano, Federico
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2021, 36 (05) : 4445 - 4456