Edge Workloads Monitoring and Failover: a StarlingX-Based Testbed Implementation and Measurement Study

被引:0
|
作者
Abuibaid, Mohammed [1 ]
Ghorab, Amir Hossein [1 ]
Seguin-Mcpeake, Aidan [2 ]
Yuen, Owen [2 ]
Yungblut, Thomas [2 ]
St-Hilaire, Marc [1 ,2 ]
机构
[1] Carleton Univ, Dept Syst & Comp Engn, Ottawa, ON K1S 5B6, Canada
[2] Carleton Univ, Sch Informat Technol, Ottawa, ON K1S 5B6, Canada
关键词
Cloud computing; Internet of Things; Monitoring; Edge computing; Scalability; Distributed computing; Collaboration; Failure analysis; Distributed cloud infrastructure; edge computing; failover; IoT; Kubernetes; microservice architecture; StarlingX platform; testbed;
D O I
10.1109/ACCESS.2022.3204976
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the ever-growing amount of time-critical, compute-intensive, and private IoT applications, the need for High Availability (HA) Edge Clouds becomes indispensable. Realizing HA Edge Clouds is inherently challenging due to the geographically-dispersed hierarchy of the Distributed Cloud Infrastructure (DCI). For example, frequent isolation between the central Cloud and Edge Clouds due to networking instability necessitates some autonomous operations at the Edge Clouds. Furthermore, because Edge Clouds have fewer resources than central Clouds, configuring the Edge functions (i.e., control, compute, and storage) in HA clusters will undoubtedly reduce downtime. However, it will limit the Edge scalability. To that end, StarlingX is developing an HA-protected and scalable DCI virtualization platform based on the open-source ecosystem, focusing on low-touch management of Edge Clouds. StarlingX provides a fault management service that realizes DCI-wide alarming and logging capabilities, allowing for rapid response to virtualized infrastructure events. Recently, the IETF Network Working Group proposed that monitoring both the DCI and the Edge workloads (software containers) is critical for an Edge Computing Platform to maintain HA IoT application deployment. Indeed, the possibility of the infrastructure remaining stable and healthy while the workloads suffer a fatal failure simultaneously necessitates failover functionality that monitors both the infrastructure and the Edge workloads. In this paper, we first propose a dynamic failover functionality that centrally monitors Edge workloads to recover from deployment or Edge node failures, motivated by the IETF direction. Second, we experimentally optimize the failover functionality for monitoring a microservice-architected IoT application deployed on a StarlingX-based DCI testbed to collect temperature sensor readings from Raspberry Pis. Regardless of how quickly the Edge workload health checks are collected, the recorded failover measurements reveal that the recovery time will not drop below a predetermined level controlled by Edge resources and network speed. Furthermore, reducing the statistics collection timeout reduces the recovery time of an Edge node failure. When the timeout value is less than the minimum achievable recovery time, false-positive failures (FPFs) can occur. Third, to supplement the StarlingX fault management service, we provide a modular implementation of the proposed failover functionality. Finally, we present the first-ever introduction of the StarlingX platform's software stack to promote its use in academic research.
引用
收藏
页码:97101 / 97116
页数:16
相关论文
共 50 条
  • [41] RHP Poles Trajectory Study for D-Q Impedance-Based Stability Monitoring Using a Power-Hardware-in-the-Loop Testbed
    Lin, Qing
    Wen, Bo
    Burgos, Rolando
    IEEE JOURNAL OF EMERGING AND SELECTED TOPICS IN POWER ELECTRONICS, 2024, 12 (02) : 1560 - 1572
  • [42] T-MQM: Testbed-Based Multi-Metric Quality Measurement of Sensor Deployment for Precision Agriculture-A Case Study
    Kaiwartya, Omprakash
    Abdullah, Abdul Hanan
    Cao, Yue
    Raw, Ram Shringar
    Kumar, Sushil
    Lobiyal, Daya Krishan
    Isnin, Ismail Fauzi
    Liu, Xiulei
    Shah, Rajiv Ratn
    IEEE SENSORS JOURNAL, 2016, 16 (23) : 8649 - 8664
  • [43] Implementation of an Activity-Monitoring System in Hospital-Based COPD Patients: A Retrospective Cohort Study
    Wilson, Christopher M.
    Burns, Grace
    Bove, Jonathon
    Ferranti, Vincenzo
    McCown, Benjamin
    Seidell, Janet Wiechec
    Colombo, Reyna
    JOURNAL OF ACUTE CARE PHYSICAL THERAPY, 2019, 10 (04) : 120 - 128
  • [44] Comparing Schedules of Progress Monitoring Using Curriculum-Based Measurement in Reading: A Replication Study
    Gesel, Samantha A.
    Lemons, Christopher J.
    EXCEPTIONAL CHILDREN, 2020, 87 (01) : 92 - 112
  • [45] Study on the Distribution Line Condition Monitoring Method Based on Real-time Measurement Data
    Liang, JiaWen
    Wang, XinTao
    Shi, Li
    Yang, Jia
    ADVANCES IN POWER AND ELECTRICAL ENGINEERING, PTS 1 AND 2, 2013, 614-615 : 911 - +
  • [46] Practical design and implementation of IoT-based occupancy monitoring systems for office buildings: A case study
    Karjou, Payam Fatehi
    Saryazdi, Sina Khodadad
    Stoffel, Phillip
    Mueller, Dirk
    ENERGY AND BUILDINGS, 2024, 323
  • [48] Study on Edge Extraction Methods for Image-based Icing On-line Monitoring on Overhead Transmission Lines
    Wang, Xiaopeng
    Hu, Jianlin
    Wu, Bin
    Du, Lin
    Sun, Caixin
    ICHVE 2008: 2008 INTERNATIONAL CONFERENCE ON HIGH VOLTAGE ENGINEERING AND APPLICATION, 2008, : 661 - +
  • [49] Challenges Related to the Implementation of Measurement-Based Care for the Treatment of Major Depressive Disorder: A Feasibility Study
    Tavakoli, Emytis
    Xiang, Angela
    Husain, Mohamed I.
    Blumberger, Daniel M.
    Kloiber, Stefan
    Mueller, Daniel J.
    Ortiz, Abigail
    Perivolaris, Athina
    Mulsant, Benoit H.
    PHARMACOPSYCHIATRY, 2025,
  • [50] Study and implementation of electrical wire remote measurement system based on B/S mode web server
    Zhu, ZW
    Jiang, JM
    Wang, CM
    ISTM/2005: 6th International Symposium on Test and Measurement, Vols 1-9, Conference Proceedings, 2005, : 6933 - 6936