Toward a Smart Cloud: A Review of Fault-Tolerance Methods in Cloud Systems

被引:55
|
作者
Mukwevho, Mukosi Abraham [1 ]
Celik, Turgay [1 ,2 ,3 ]
机构
[1] Univ Witwatersrand, Sch Elect & Informat Engn, Johannesburg, South Africa
[2] Univ Witwatersrand, Wits Inst Data Sci, Johannesburg, South Africa
[3] Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu, Peoples R China
关键词
Cloud computing; Fault tolerance; Fault tolerant systems; Software as a service; Computer architecture; fault-tolerance; reliability; availability; smart cloud; machine learning; artificial intelligence; STRATEGY; VISION;
D O I
10.1109/TSC.2018.2816644
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a comprehensive survey of the state-of-the-art work on fault tolerance methods proposed for cloud computing. The survey classifies fault-tolerance methods into three categories: 1) ReActive Methods (RAMs); 2) PRoactive Methods (PRMs); and 3) ReSilient Methods (RSMs). RAMs allow the system to enter into a fault status and then try to recover the system. PRMs tend to prevent the system from entering a fault status by implementing mechanisms that enable them to avoid errors before they affect the system. On the other hand, recently emerging RSMs aim to minimize the amount of time it takes for a system to recover from a fault. Machine Learning and Artificial Intelligence have played an active role in RSM domain in such a way that the recovery time is mapped to a function to be optimized (i.e., by converging the recovery time to a fraction of milliseconds). As the system learns to deal with new faults, the recovery time will become shorter. In addition, current issues and challenges in cloud fault tolerance are also discussed to identify promising areas for future research.
引用
收藏
页码:589 / 605
页数:17
相关论文
共 50 条
  • [1] Multiple Fault-tolerance Mechanisms in Cloud Systems: a Systematic Review
    Marcotte, Philippe
    Gregoire, Frederic
    Petrillo, Fabio
    [J]. 2019 IEEE 30TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW 2019), 2019, : 414 - 421
  • [2] Extendibility, scalability and fault-tolerance methods for cloud robots especially for cloud nanorobots
    Zhu, Dingju
    [J]. Journal of Computational and Theoretical Nanoscience, 2015, 12 (12) : 6208 - 6219
  • [3] Fault-Tolerance in the Scope of Cloud Computing
    Rehman, A. U.
    Aguiar, Rui L.
    Barraca, Joao Paulo
    [J]. IEEE ACCESS, 2022, 10 : 63422 - 63441
  • [4] A lightweight software fault-tolerance system in the cloud environment
    Chen, Gang
    Jin, Hai
    Zou, Deqing
    Zhou, Bing Bing
    Qiang, Weizhong
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (12): : 2982 - 2998
  • [5] An Efficient Intermediate Data Fault-Tolerance Approach in the Cloud
    Song, Baoyan
    Ren, Cai
    Li, Xuecheng
    Ding, Linlin
    [J]. 2014 11TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2014, : 203 - 206
  • [6] A SCHEME OF DATA CONFIDENTIALITY AND FAULT-TOLERANCE IN CLOUD STORAGE
    Fu, Yongkang
    Sun, Bin
    [J]. 2012 IEEE 2ND INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENT SYSTEMS (CCIS) VOLS 1-3, 2012, : 228 - 233
  • [7] Proactive Fault-Tolerance Technique to Enhance Reliability of Cloud Service in Cloud Federation Environment
    Ray, Benay Kumar
    Saha, Avirup
    Khatua, Sunirmal
    Roy, Sarbani
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (02) : 957 - 971
  • [8] Lightweight secure storage model with fault-tolerance in cloud environment
    Ahmed, Muhra
    Vu, Quang Hieu
    Asal, Rasool
    Al Muhairi, Hassan
    Yeun, Chan Yeob
    [J]. ELECTRONIC COMMERCE RESEARCH, 2014, 14 (03) : 271 - 291
  • [9] Lightweight secure storage model with fault-tolerance in cloud environment
    Muhra Ahmed
    Quang Hieu Vu
    Rasool Asal
    Hassan Al Muhairi
    Chan Yeob Yeun
    [J]. Electronic Commerce Research, 2014, 14 : 271 - 291
  • [10] Adaptive Application Scaling for Improving Fault-Tolerance and Availability in the Cloud
    Radhakrishnan, Ganesan
    [J]. BELL LABS TECHNICAL JOURNAL, 2012, 17 (02) : 5 - 14