Toward a Smart Cloud: A Review of Fault-Tolerance Methods in Cloud Systems

被引:55
|
作者
Mukwevho, Mukosi Abraham [1 ]
Celik, Turgay [1 ,2 ,3 ]
机构
[1] Univ Witwatersrand, Sch Elect & Informat Engn, Johannesburg, South Africa
[2] Univ Witwatersrand, Wits Inst Data Sci, Johannesburg, South Africa
[3] Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu, Peoples R China
关键词
Cloud computing; Fault tolerance; Fault tolerant systems; Software as a service; Computer architecture; fault-tolerance; reliability; availability; smart cloud; machine learning; artificial intelligence; STRATEGY; VISION;
D O I
10.1109/TSC.2018.2816644
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a comprehensive survey of the state-of-the-art work on fault tolerance methods proposed for cloud computing. The survey classifies fault-tolerance methods into three categories: 1) ReActive Methods (RAMs); 2) PRoactive Methods (PRMs); and 3) ReSilient Methods (RSMs). RAMs allow the system to enter into a fault status and then try to recover the system. PRMs tend to prevent the system from entering a fault status by implementing mechanisms that enable them to avoid errors before they affect the system. On the other hand, recently emerging RSMs aim to minimize the amount of time it takes for a system to recover from a fault. Machine Learning and Artificial Intelligence have played an active role in RSM domain in such a way that the recovery time is mapped to a function to be optimized (i.e., by converging the recovery time to a fraction of milliseconds). As the system learns to deal with new faults, the recovery time will become shorter. In addition, current issues and challenges in cloud fault tolerance are also discussed to identify promising areas for future research.
引用
收藏
页码:589 / 605
页数:17
相关论文
共 50 条
  • [41] Fault-Tolerance in Cyber-Physical Systems: Literature Review and Challenges
    Piardi, Luis
    Leitao, Paulo
    de Oliveira, Andre Schneider
    2020 IEEE 18TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), VOL 1, 2020, : 29 - 34
  • [42] Fault Tolerance in Cloud: A Brief Survey
    Agarwal, Kamal K.
    Kotakula, Haribabu
    ADVANCED INFORMATION NETWORKING AND APPLICATIONS, AINA-2022, VOL 3, 2022, 451 : 578 - 589
  • [43] Fault Tolerance in Cloud Computing - Survey
    Ataallah, Salma M. A.
    Nassar, Salwa M.
    Hemayed, Elsayed E.
    2015 11TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2015, : 241 - 245
  • [44] A survey of fault tolerance in cloud computing
    Kumari, Priti
    Kaur, Parmeet
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2021, 33 (10) : 1159 - 1176
  • [45] Efficient Fault Tolerance on Cloud Environments
    Goundar, Sam
    Bhardwaj, Akashdeep
    INTERNATIONAL JOURNAL OF CLOUD APPLICATIONS AND COMPUTING, 2018, 8 (03) : 20 - 31
  • [46] Inspection of Fault Tolerance in Cloud Environment
    Jain, Deepanshu
    Zaidi, Nabeel
    Bansal, Raghav
    Kumar, Praveen
    Choudhury, Tanupriya
    INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, INDIA 2017, 2018, 672 : 1022 - 1030
  • [47] Fault Tolerance in a Cloud of Databases environment
    Chatti, Syrine
    Ounelli, Habib
    2017 31ST IEEE INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (IEEE WAINA 2017), 2017, : 166 - 171
  • [48] Toward a reliable, secure and fault tolerant smart grid state estimation in the cloud
    Maheshwari, Ketan
    Lim, Marcus
    Wang, Lydia
    Birman, Ken
    van Renesse, Robbert
    2013 IEEE PES INNOVATIVE SMART GRID TECHNOLOGIES (ISGT), 2013,
  • [49] Fault-tolerance in air traffic control systems
    Cristian, F
    Dancey, B
    Dehn, J
    ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1996, 14 (03): : 265 - 286
  • [50] Verifying Fault-Tolerance in Probabilistic Swarm Systems
    Lomuscio, Alessio
    Pirovano, Edoardo
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 325 - 331