Edge Workloads Monitoring and Failover: a StarlingX-Based Testbed Implementation and Measurement Study

被引:0
|
作者
Abuibaid, Mohammed [1 ]
Ghorab, Amir Hossein [1 ]
Seguin-Mcpeake, Aidan [2 ]
Yuen, Owen [2 ]
Yungblut, Thomas [2 ]
St-Hilaire, Marc [1 ,2 ]
机构
[1] Carleton Univ, Dept Syst & Comp Engn, Ottawa, ON K1S 5B6, Canada
[2] Carleton Univ, Sch Informat Technol, Ottawa, ON K1S 5B6, Canada
关键词
Cloud computing; Internet of Things; Monitoring; Edge computing; Scalability; Distributed computing; Collaboration; Failure analysis; Distributed cloud infrastructure; edge computing; failover; IoT; Kubernetes; microservice architecture; StarlingX platform; testbed;
D O I
10.1109/ACCESS.2022.3204976
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the ever-growing amount of time-critical, compute-intensive, and private IoT applications, the need for High Availability (HA) Edge Clouds becomes indispensable. Realizing HA Edge Clouds is inherently challenging due to the geographically-dispersed hierarchy of the Distributed Cloud Infrastructure (DCI). For example, frequent isolation between the central Cloud and Edge Clouds due to networking instability necessitates some autonomous operations at the Edge Clouds. Furthermore, because Edge Clouds have fewer resources than central Clouds, configuring the Edge functions (i.e., control, compute, and storage) in HA clusters will undoubtedly reduce downtime. However, it will limit the Edge scalability. To that end, StarlingX is developing an HA-protected and scalable DCI virtualization platform based on the open-source ecosystem, focusing on low-touch management of Edge Clouds. StarlingX provides a fault management service that realizes DCI-wide alarming and logging capabilities, allowing for rapid response to virtualized infrastructure events. Recently, the IETF Network Working Group proposed that monitoring both the DCI and the Edge workloads (software containers) is critical for an Edge Computing Platform to maintain HA IoT application deployment. Indeed, the possibility of the infrastructure remaining stable and healthy while the workloads suffer a fatal failure simultaneously necessitates failover functionality that monitors both the infrastructure and the Edge workloads. In this paper, we first propose a dynamic failover functionality that centrally monitors Edge workloads to recover from deployment or Edge node failures, motivated by the IETF direction. Second, we experimentally optimize the failover functionality for monitoring a microservice-architected IoT application deployed on a StarlingX-based DCI testbed to collect temperature sensor readings from Raspberry Pis. Regardless of how quickly the Edge workload health checks are collected, the recorded failover measurements reveal that the recovery time will not drop below a predetermined level controlled by Edge resources and network speed. Furthermore, reducing the statistics collection timeout reduces the recovery time of an Edge node failure. When the timeout value is less than the minimum achievable recovery time, false-positive failures (FPFs) can occur. Third, to supplement the StarlingX fault management service, we provide a modular implementation of the proposed failover functionality. Finally, we present the first-ever introduction of the StarlingX platform's software stack to promote its use in academic research.
引用
收藏
页码:97101 / 97116
页数:16
相关论文
共 50 条
  • [21] The Implementation of a Low Power Environmental Monitoring and Soil Moisture Measurement System Based on UHF RFID
    Korosak, Ziga
    Suhadolnik, Nejc
    Pletersek, Anton
    SENSORS, 2019, 19 (24)
  • [22] Design and Implementation of Intelligent Oilfield Monitoring and Data Transmission System Based on Cloud-Edge Collaboration Technology
    Lang, Haocheng
    Zhang, Zhenjiang
    Yang, Qianli
    Zhao, Qingyu
    Journal of Computers (Taiwan), 2024, 35 (06) : 109 - 122
  • [23] Design and Implementation of Internet of Things Monitoring System Based on Neural Network Algorithm and Cloud Edge Collaborative Architecture
    Zhao, Na
    Pu, Kaijie
    2022 INTERNATIONAL CONFERENCE ON INDUSTRIAL IOT, BIG DATA AND SUPPLY CHAIN, IIOTBDSC, 2022, : 30 - 34
  • [24] Wide Area Monitoring System Implementation in Securing Voltage Stability based on Phasor Measurement Unit Data
    Putranto, Lesnanto M.
    Hoonchareon, Naebboon
    2013 10TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2013,
  • [25] Implementation leadership and implementation climate in context: A single organization intrinsic case study for implementation of digital measurement-based care
    Sklar, Marisa
    Ehrhart, Mark G.
    Ramirez, Nallely
    Carandang, Kristine
    Kuhn, Nicolle
    Day, Ana
    Aarons, Gregory A.
    Williams, Nathaniel J.
    IMPLEMENTATION RESEARCH AND PRACTICE, 2024, 5
  • [26] Feasibility of LoRa Implementation for Remote Weather Monitoring System through Field Measurement and Case Study Analysis
    Abd Rahman, N. H.
    Yamada, Y.
    Husni, M. H.
    Aziz, N. H. Abdul
    INTERNATIONAL JOURNAL OF INTEGRATED ENGINEERING, 2018, 10 (07): : 294 - 303
  • [27] Study of Underway Salinity Monitoring Device Based on Optical Refractive Index Measurement
    Yu, Long
    Chen, Junyao
    Guo, Wenping
    Yang, Kecheng
    Yin, Xiaojun
    AOPC 2017: OPTICAL SENSING AND IMAGING TECHNOLOGY AND APPLICATIONS, 2017, 10462
  • [28] Displacement Measurement and Monitoring with Ground-Based SAR; Case Study at Aratozawa
    Zou, Lilong
    Takahashi, Kazunori
    Sato, Motoyuki
    2014 ASIA-PACIFIC MICROWAVE CONFERENCE (APMC), 2014, : 1022 - 1024
  • [29] A study on the development of continuous monitoring systems using measurement information based on CCTV
    Kwon, Sungil
    Kwon, Hyongil
    Kwon, Yongjeong
    Noh, Jinseok
    JOURNAL OF THE GEOLOGICAL SOCIETY OF KOREA, 2025, 61 (01) : 97 - 106
  • [30] mHealth Monitoring of Treatment of Cutaneous Leishmaniasis Patients: A Community-Based Implementation Study
    Cossio, Alexandra
    Bautista-Gomez, Martha Milena
    Alexander, Neal
    del Castillo, Alejandra Maria
    Castro, Maria del Mar
    Castano-Grajales, Patricia Yaneth
    Gutierrez-Poloche, Yeison Hawer
    Zuluaga, Laura Sofia
    Vargas-Bernal, Leonardo
    Navarro, Andres
    Saravia, Nancy Gore
    AMERICAN JOURNAL OF TROPICAL MEDICINE AND HYGIENE, 2023, 109 (04): : 778 - 790