ANALYSIS AND MODELING OF CORRELATED FAILURES IN MULTICOMPUTER SYSTEMS

被引:29
|
作者
TANG, D
IYER, RK
机构
[1] Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana
关键词
CORRELATED FAILURE; DEPENDABILITY EVALUATION; DEPENDABILITY MODELING; FAILURE MEASUREMENT; MARKOV MODEL; MULTICOMPUTER SYSTEM;
D O I
10.1109/12.142683
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Based on the measurements from two DEC VAX-cluster multicomputer systems, this paper addresses the issue of correlated failures. In particular, the characteristics of correlated failures, the impact of correlated failures on dependability, and the modeling of correlated failures are discussed. It is found from the data that most correlated failures are related to errors in shared resources and propagate from one machine to another. Comparisons between measurement-based models and analytical models that assume failure independence show that the impact of correlated failures on dependability is significant. Two validated models, the c-dependent model and the p-dependent model, are developed to evaluate dependability of systems with correlated failures.
引用
收藏
页码:567 / 577
页数:11
相关论文
共 50 条