High-Availability Computing Platform with Sensor Fault Resilience

被引:3
|
作者
Lee, Yen-Lin [1 ]
Arizky, Shinta Nuraisya [1 ]
Chen, Yu-Ren [1 ,2 ]
Liang, Deron [1 ]
Wang, Wei-Jen [1 ]
机构
[1] Natl Cent Univ, Dept Comp Sci & Informat Engn, Taoyuan 320, Taiwan
[2] Inst Informat Ind, Taipei 106, Taiwan
关键词
failover; high availability; sensor fault; fault detection and recovery; liveness detection;
D O I
10.3390/s21020542
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Modern computing platforms usually use multiple sensors to report system information. In order to achieve high availability (HA) for the platform, the sensors can be used to efficiently detect system faults that make a cloud service not live. However, a sensor may fail and disable HA protection. In this case, human intervention is needed, either to change the original fault model or to fix the sensor fault. Therefore, this study proposes an HA mechanism that can continuously provide HA to a cloud system based on dynamic fault model reconstruction. We have implemented the proposed HA mechanism on a four-layer OpenStack cloud system and tested the performance of the proposed mechanism for all possible sets of sensor faults. For each fault model, we inject possible system faults and measure the average fault detection time. The experimental result shows that the proposed mechanism can accurately detect and recover an injected system fault with disabled sensors. In addition, the system fault detection time increases as the number of sensor faults increases, until the HA mechanism is degraded to a one-system-fault model, which is the worst case as the system layer heartbeating.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 50 条
  • [1] A High-Availability Cloud for Research Computing
    Riley, Justin
    Noss, John
    Dillingham, Wes
    Cuff, James
    Llorente, Ignacio M.
    [J]. COMPUTER, 2017, 50 (06) : 92 - 95
  • [2] A Deployment Management of High-Availability Microservices for Edge Computing
    Chen, Hung-Ming
    Chen, Shih-Ying
    Zheng, Zhong-Xiang
    Huang, Ti-Wei
    Huang, Cheng-Yu
    [J]. 2020 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C 2020), 2021, : 110 - 113
  • [3] High-availability server platform for IP communication services
    Kimura, N
    Yamada, A
    Seshake, H
    Nishizono, T
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART I-COMMUNICATIONS, 2006, 89 (06): : 41 - 50
  • [4] Design and Implementation of High-availability PaaS Platform Based on Virtualization Platform
    Wen, Zepeng
    Liang, Yan
    Li, Gongliang
    [J]. PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020), 2020, : 1571 - 1575
  • [5] Availability requirement for a fault-management server in high-availability communication systems
    Sun, HR
    Han, JJ
    Levendel, H
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 2003, 52 (02) : 238 - 244
  • [6] High-availability Internet servers: Linux clustering on a CompactPCI platform
    Mueller, H
    Scott, R
    [J]. EDN, 2000, 45 (25) : 143 - +
  • [7] Optimal recovery schemes for high-availability cluster and distributed computing
    Lundberg, L
    Svahnberg, C
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2001, 61 (11) : 1680 - 1691
  • [8] An integrated high availability computing platform
    Han, Y
    [J]. ELECTRONIC LIBRARY, 2005, 23 (06): : 632 - 640
  • [9] Availability, resilience, and fault tolerance of internet and distributed computing systems
    Xiang, Yang
    Pathan, Mukaddim
    Wei, Guiyi
    Fortino, Giancarlo
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (10): : 2503 - 2505
  • [10] Towards an Environment Supporting Resilience, High-Availability, Reproducibility and Reliability for Cloud Applications
    Stankovski, Vlado
    Taherizadeh, Salman
    Taylor, Ian
    Jones, Andrew
    Mastroianni, Carlo
    Becker, Bruce
    Suhartanto, Heru
    [J]. 2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 383 - 386