A simplified reliability analysis method for cloud computing systems considering common-cause failures

被引:2
|
作者
Li, Ruiying [1 ,2 ]
Li, Qiong [1 ]
Huang, Ning [1 ,2 ]
Kang, Rui [1 ,2 ]
机构
[1] Beihang Univ, Sch Reliabil & Syst Engn, 37 Xueyuan Rd, Beijing 100191, Peoples R China
[2] Sci & Technol Reliabil & Environm Engn Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Cloud computing; reliability modeling; common-cause failure; simplification; fault tree; state-space; MODEL;
D O I
10.1177/1748006X17703863
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Virtualization is one of the main features of cloud computing systems, which enables building multiple virtual machines on a single server. However, this feature brings new challenge in reliability modeling, as the failure of the server will make all its co-located virtual machines inoperable, which is a typical common-cause failure. To satisfy the demand of the cloud computing system, the reliability of the system is defined as the probability that at least a given number of virtual machines are operable. State-space enumeration is one method to calculate such reliability; however, due to the large number of combinations, it is time-consuming and impractical. To solve this problem, we propose a simplified reliability analysis method based on fault tree and state-space models. Two illustrative examples are studied to show the process and the effectiveness of our method. State enumeration and Monte Carlo simulation are also used to prove the correctness of our method as back-to-back verifications. Compared to the reliability analysis without considering common-cause failures, our results are quite different, which illustrates the necessity of considering common-cause failures in the reliability of cloud computing systems.
引用
收藏
页码:324 / 333
页数:11
相关论文
共 50 条
  • [1] Distributed computer systems reliability considering imperfect coverage and common-cause failures
    Xing, LD
    Shrestha, A
    [J]. 11th International Conference on Parallel and Distributed Systems Workshops, Vol II, Proceedings,, 2005, : 453 - 457
  • [2] Reliability analysis of fault-tolerant systems with common-cause failures
    Xing, LD
    [J]. 2003 INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2003, : 689 - 698
  • [3] Optimal redundancy allocation for systems considering common-cause failures
    Ramirez-Marquez, J
    Coit, DW
    Wattanapongsakorn, N
    [J]. SAFETY AND RELIABILITY, VOLS 1 AND 2, 2003, : 1295 - 1300
  • [4] Component Importance Analysis of Mobile Cloud Computing Systemin the Presence of Common-Cause Failures
    Zheng, Junjun
    Okamura, Hiroyuki
    Dohi, Tadashi
    [J]. IEEE ACCESS, 2018, 6 : 18630 - 18642
  • [5] STOCHASTIC-ANALYSIS OF COMMON SYSTEMS WITH COMMON-CAUSE FAILURES
    DHILLON, BS
    VISWANATH, HC
    [J]. STOCHASTIC ANALYSIS AND APPLICATIONS, 1994, 12 (04) : 427 - 452
  • [6] Infrastructure communication reliability of wireless sensor networks considering common-cause failures
    Shrestha, Akhilesh
    Xing, Liudong
    Sun, Yan
    Vokkarane, Vinod M.
    [J]. International Journal of Performability Engineering, 2012, 8 (02) : 141 - 150
  • [7] A MODEL FOR SYSTEM RELIABILITY WITH COMMON-CAUSE FAILURES
    PAGE, LB
    PERRY, JE
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 1989, 38 (04) : 406 - 410
  • [8] SYSTEM RELIABILITY IN THE PRESENCE OF COMMON-CAUSE FAILURES
    CHAE, KC
    CLARK, GM
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 1986, 35 (01) : 32 - 35
  • [9] Reliability analysis of hierarchical computer-based systems subject to common-cause failures
    Xing, Liudong
    Meshkat, Leila
    Donohue, Susan K.
    [J]. RELIABILITY ENGINEERING & SYSTEM SAFETY, 2007, 92 (03) : 351 - 359
  • [10] Reliability Modeling of Parallel Systems under Multiple Common-Cause Failures
    Gu, Ruoxing
    Qin, Jin
    [J]. PROCEEDINGS OF 2014 10TH INTERNATIONAL CONFERENCE ON RELIABILITY, MAINTAINABILITY AND SAFETY (ICRMS), VOLS I AND II, 2014, : 389 - 393