Toward a Smart Cloud: A Review of Fault-Tolerance Methods in Cloud Systems

被引:55
|
作者
Mukwevho, Mukosi Abraham [1 ]
Celik, Turgay [1 ,2 ,3 ]
机构
[1] Univ Witwatersrand, Sch Elect & Informat Engn, Johannesburg, South Africa
[2] Univ Witwatersrand, Wits Inst Data Sci, Johannesburg, South Africa
[3] Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu, Peoples R China
关键词
Cloud computing; Fault tolerance; Fault tolerant systems; Software as a service; Computer architecture; fault-tolerance; reliability; availability; smart cloud; machine learning; artificial intelligence; STRATEGY; VISION;
D O I
10.1109/TSC.2018.2816644
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a comprehensive survey of the state-of-the-art work on fault tolerance methods proposed for cloud computing. The survey classifies fault-tolerance methods into three categories: 1) ReActive Methods (RAMs); 2) PRoactive Methods (PRMs); and 3) ReSilient Methods (RSMs). RAMs allow the system to enter into a fault status and then try to recover the system. PRMs tend to prevent the system from entering a fault status by implementing mechanisms that enable them to avoid errors before they affect the system. On the other hand, recently emerging RSMs aim to minimize the amount of time it takes for a system to recover from a fault. Machine Learning and Artificial Intelligence have played an active role in RSM domain in such a way that the recovery time is mapped to a function to be optimized (i.e., by converging the recovery time to a fraction of milliseconds). As the system learns to deal with new faults, the recovery time will become shorter. In addition, current issues and challenges in cloud fault tolerance are also discussed to identify promising areas for future research.
引用
收藏
页码:589 / 605
页数:17
相关论文
共 50 条
  • [21] Protecting Medical Data in Cloud Storage Using Fault-Tolerance Mechanism
    Marwan, M.
    Kartit, A.
    Ouahmane, H.
    2017 INTERNATIONAL CONFERENCE ON SMART DIGITAL ENVIRONMENT (ICSDE'17), 2017, : 214 - 219
  • [22] Dynamic Fault-tolerance and Mobility Provisioning for Services on Mobile Cloud Platforms
    Stahl, Philip
    Broberg, Jonatan
    Landfeldt, Bjorn
    2017 5TH IEEE INTERNATIONAL CONFERENCE ON MOBILE CLOUD COMPUTING, SERVICES, AND ENGINEERING (MOBILECLOUD), 2017, : 131 - 138
  • [23] A Review Paper on Fault Tolerance in Cloud Computing
    Mittal, Deepali
    Agarwal, Neha
    2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 31 - 34
  • [24] Flexible Supervision System: A Fast Fault-Tolerance Strategy for Cloud Applications in Cloud-Edge Collaborative Environments
    Cai, Weilin
    Chen, Heng
    Zhuo, Zhimin
    Wang, Ziheng
    An, Ninggang
    NETWORK AND PARALLEL COMPUTING, NPC 2022, 2022, 13615 : 108 - 113
  • [25] A practical cross-datacenter fault-tolerance algorithm in the cloud storage system
    Yuxia Cheng
    Xinjie Yu
    Wenzhi Chen
    Rui Chang
    Yang Xiang
    Cluster Computing, 2017, 20 : 1801 - 1813
  • [26] A practical cross-datacenter fault-tolerance algorithm in the cloud storage system
    Cheng, Yuxia
    Yu, Xinjie
    Chen, Wenzhi
    Chang, Rui
    Xiang, Yang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (02): : 1801 - 1813
  • [27] QoS-Aware Task Placement With Fault-Tolerance in the Edge-Cloud
    Sun, Huaiying
    Yu, Huiqun
    Fan, Guisheng
    Chen, Liqiong
    IEEE ACCESS, 2020, 8 : 77987 - 78003
  • [28] Dynamic Approach Based on Learning Automata for Data Fault-Tolerance in the Cloud Storage
    Hosseini, Seyyed Mansour
    Arani, Mostafa Ghobaei
    Kenari, Abdol Reza Rasouli
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2015, 8 (06): : 91 - 103
  • [29] Fault-tolerance in biochemical systems
    Winfree, Erik
    UNCONVENTIONAL COMPUTATION, PROCEEDINGS, 2006, 4135 : 26 - 26
  • [30] Toward a Fault-Tolerance Framework for COTS Many-Core Systems
    Munk, Peter
    Alhakeem, Mohammad Shadi
    Lisicki, Raphael
    Parzyjegla, Helge
    Richling, Jan
    Heiss, Hans-Ulrich
    2015 ELEVENTH EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC), 2015, : 167 - 177