Fault Tolerance of Cloud Infrastructure with Machine Learning

被引:2
|
作者
Kalaskar, Chetankumar [1 ]
Thangam, S. [1 ]
机构
[1] Amrita Vishwavidyapeetam, Amrita Sch Comp, Dept Comp Sci & Engn, Bangalore 560035, Karnataka, India
关键词
Cloud computing; Fault tolerance; Machine learning; Reliability of cloud;
D O I
10.2478/cait-2023-0034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Enhancing the fault tolerance of cloud systems and accurately forecasting cloud performance are pivotal concerns in cloud computing research. This research addresses critical concerns in cloud computing by enhancing fault tolerance and forecasting cloud performance using machine learning models. Leveraging the Google trace dataset with 10000 cloud environment records encompassing diverse metrics, we systematically have employed machine learning algorithms, including linear regression, decision trees, and gradient boosting, to construct predictive models. These models have outperformed baseline methods, with C5.0 and XGBoost showing exceptional accuracy, precision, and reliability in forecasting cloud behavior. Feature importance analysis has identified the ten most influential factors affecting cloud system performance. This work significantly advances cloud optimization and reliability, enabling proactive monitoring, early performance issue detection, and improved fault tolerance. Future research can further refine these predictive models, enhancing cloud resource management and ultimately improving service delivery in cloud computing.
引用
收藏
页码:26 / 50
页数:25
相关论文
共 50 条
  • [1] Performance Analysis of Machine Learning Based Fault Detection for Cloud Infrastructure
    Won, Hojoon
    Kim, Younghan
    35TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2021), 2021, : 877 - 880
  • [2] Design and Implementation of Machine Learning-Based Fault Prediction System in Cloud Infrastructure
    Yang, Hyunsik
    Kim, Younghan
    ELECTRONICS, 2022, 11 (22)
  • [3] On misbehaviour and fault tolerance in machine learning systems
    Myllyaho, Lalli
    Raatikainen, Mikko
    Mannisto, Tomi
    Nurminen, Jukka K.
    Mikkonen, Tommi
    JOURNAL OF SYSTEMS AND SOFTWARE, 2022, 183
  • [4] Improving the Performance of Secure Cloud Infrastructure With Machine Learning Techniques
    Sarma, M. Subrahmanya
    Srinivas, Y.
    Ramesh, N.
    Abhiram, M.
    2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING IN EMERGING MARKETS (CCEM), 2016, : 78 - 83
  • [5] Fault and Noise Tolerance in the Incremental Extreme Learning Machine
    Leung, Ho Chun
    Leung, Chi Sing
    Wong, Eric Wing Ming
    IEEE ACCESS, 2019, 7 : 155171 - 155183
  • [6] Fault Tolerance in Iterative-Convergent Machine Learning
    Qiao, Aurick
    Aragam, Bryon
    Zhang, Bingjing
    Xing, Eric P.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [7] Byzantine fault tolerance in distributed machine learning: a survey
    Bouhata, Djamila
    Moumen, Hamouma
    Mazari, Jocelyn Ahmed
    Bounceur, Ahcene
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2024,
  • [8] On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism
    Yang, Chao-Tung
    Liu, Jung-Chun
    Hsu, Ching-Hsien
    Chou, Wei-Li
    JOURNAL OF SUPERCOMPUTING, 2014, 69 (03): : 1103 - 1122
  • [9] A fault tolerance aware virtual machine scheduling algorithm in cloud computing
    Xu H.
    Cheng P.
    Liu Y.
    Wei W.
    International Journal of Performability Engineering, 2019, 15 (11): : 2990 - 2997
  • [10] On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism
    Chao-Tung Yang
    Jung-Chun Liu
    Ching-Hsien Hsu
    Wei-Li Chou
    The Journal of Supercomputing, 2014, 69 : 1103 - 1122