Toward a Smart Cloud: A Review of Fault-Tolerance Methods in Cloud Systems

被引:55
|
作者
Mukwevho, Mukosi Abraham [1 ]
Celik, Turgay [1 ,2 ,3 ]
机构
[1] Univ Witwatersrand, Sch Elect & Informat Engn, Johannesburg, South Africa
[2] Univ Witwatersrand, Wits Inst Data Sci, Johannesburg, South Africa
[3] Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu, Peoples R China
关键词
Cloud computing; Fault tolerance; Fault tolerant systems; Software as a service; Computer architecture; fault-tolerance; reliability; availability; smart cloud; machine learning; artificial intelligence; STRATEGY; VISION;
D O I
10.1109/TSC.2018.2816644
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a comprehensive survey of the state-of-the-art work on fault tolerance methods proposed for cloud computing. The survey classifies fault-tolerance methods into three categories: 1) ReActive Methods (RAMs); 2) PRoactive Methods (PRMs); and 3) ReSilient Methods (RSMs). RAMs allow the system to enter into a fault status and then try to recover the system. PRMs tend to prevent the system from entering a fault status by implementing mechanisms that enable them to avoid errors before they affect the system. On the other hand, recently emerging RSMs aim to minimize the amount of time it takes for a system to recover from a fault. Machine Learning and Artificial Intelligence have played an active role in RSM domain in such a way that the recovery time is mapped to a function to be optimized (i.e., by converging the recovery time to a fraction of milliseconds). As the system learns to deal with new faults, the recovery time will become shorter. In addition, current issues and challenges in cloud fault tolerance are also discussed to identify promising areas for future research.
引用
收藏
页码:589 / 605
页数:17
相关论文
共 50 条
  • [31] OPERATING-SYSTEMS AND FAULT-TOLERANCE
    SCHLICHTING, RD
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 563 : 150 - 153
  • [32] FAULT-TOLERANCE IN AUTOMATED MANUFACTURING SYSTEMS
    MENDIGUTXIA, J
    ZUBIZARRETA, P
    GOENAGA, JM
    BERASATEGUI, L
    MANERO, L
    EXPERT SYSTEMS WITH APPLICATIONS, 1995, 8 (02) : 275 - 285
  • [33] ON THE DESIGN OF MANUFACTURING SYSTEMS FOR FAULT-TOLERANCE
    HAMMER, DK
    PELS, HJ
    TIMMERMANS, PJM
    PRODUCTION MANAGEMENT METHODS, 1994, 19 : 325 - 333
  • [34] Fault-tolerance model of the information systems
    Potapov, V., I
    Goleva, A., I
    Storozhenko, N. R.
    Shafeeva, O. P.
    Pastuhova, E., I
    Chervenchuk, I., V
    MECHANICAL SCIENCE AND TECHNOLOGY UPDATE (MSTU 2019), 2019, 1260
  • [35] Design-time Analysis of Time-Critical and Fault-Tolerance Constraints in Cloud Services
    Andreoli, Remo
    Gustafsson, Harald
    Abeni, Luca
    Mini, Raquel
    Cucinotta, Tommaso
    2023 IEEE 16TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, CLOUD, 2023, : 415 - 417
  • [36] Architecture-Based Reliability-Sensitive Criticality Measure for Fault-Tolerance Cloud Applications
    Wang, Lei
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (11) : 2408 - 2421
  • [37] LARFH: provisioning dynamic approach based on learning automata for data fault-tolerance in the cloud storage
    Hosseini, Seyyed Mansour
    Arani, Mostafa Ghobaei
    2015 2ND INTERNATIONAL CONFERENCE ON KNOWLEDGE-BASED ENGINEERING AND INNOVATION (KBEI), 2015, : 745 - 753
  • [38] Reliability Equations for Cloud Storage Systems with Proactive Fault Tolerance
    Li, Jing
    Li, Peng
    Stones, Rebecca J.
    Wang, Gang
    Li, Zhongwei
    Liu, Xiaoguang
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2020, 17 (04) : 782 - 794
  • [39] A Novel Parallel Architecture with Fault-Tolerance for Joining Bi-Directional Data Streams in Cloud
    Liu, Xinchun
    Fan, Xiaopeng
    Li, Jing
    2013 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CLOUDCOM-ASIA), 2013, : 30 - 37
  • [40] Cloud-based fault tolerance of safety control in the cloud
    Fischer M.
    Walker M.
    Lechler A.
    Riedel O.
    Verl A.
    WT Werkstattstechnik, 2023, 113 (05): : 189 - 194