Edge Workloads Monitoring and Failover: a StarlingX-Based Testbed Implementation and Measurement Study

被引:0
|
作者
Abuibaid, Mohammed [1 ]
Ghorab, Amir Hossein [1 ]
Seguin-Mcpeake, Aidan [2 ]
Yuen, Owen [2 ]
Yungblut, Thomas [2 ]
St-Hilaire, Marc [1 ,2 ]
机构
[1] Carleton Univ, Dept Syst & Comp Engn, Ottawa, ON K1S 5B6, Canada
[2] Carleton Univ, Sch Informat Technol, Ottawa, ON K1S 5B6, Canada
关键词
Cloud computing; Internet of Things; Monitoring; Edge computing; Scalability; Distributed computing; Collaboration; Failure analysis; Distributed cloud infrastructure; edge computing; failover; IoT; Kubernetes; microservice architecture; StarlingX platform; testbed;
D O I
10.1109/ACCESS.2022.3204976
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the ever-growing amount of time-critical, compute-intensive, and private IoT applications, the need for High Availability (HA) Edge Clouds becomes indispensable. Realizing HA Edge Clouds is inherently challenging due to the geographically-dispersed hierarchy of the Distributed Cloud Infrastructure (DCI). For example, frequent isolation between the central Cloud and Edge Clouds due to networking instability necessitates some autonomous operations at the Edge Clouds. Furthermore, because Edge Clouds have fewer resources than central Clouds, configuring the Edge functions (i.e., control, compute, and storage) in HA clusters will undoubtedly reduce downtime. However, it will limit the Edge scalability. To that end, StarlingX is developing an HA-protected and scalable DCI virtualization platform based on the open-source ecosystem, focusing on low-touch management of Edge Clouds. StarlingX provides a fault management service that realizes DCI-wide alarming and logging capabilities, allowing for rapid response to virtualized infrastructure events. Recently, the IETF Network Working Group proposed that monitoring both the DCI and the Edge workloads (software containers) is critical for an Edge Computing Platform to maintain HA IoT application deployment. Indeed, the possibility of the infrastructure remaining stable and healthy while the workloads suffer a fatal failure simultaneously necessitates failover functionality that monitors both the infrastructure and the Edge workloads. In this paper, we first propose a dynamic failover functionality that centrally monitors Edge workloads to recover from deployment or Edge node failures, motivated by the IETF direction. Second, we experimentally optimize the failover functionality for monitoring a microservice-architected IoT application deployed on a StarlingX-based DCI testbed to collect temperature sensor readings from Raspberry Pis. Regardless of how quickly the Edge workload health checks are collected, the recorded failover measurements reveal that the recovery time will not drop below a predetermined level controlled by Edge resources and network speed. Furthermore, reducing the statistics collection timeout reduces the recovery time of an Edge node failure. When the timeout value is less than the minimum achievable recovery time, false-positive failures (FPFs) can occur. Third, to supplement the StarlingX fault management service, we provide a modular implementation of the proposed failover functionality. Finally, we present the first-ever introduction of the StarlingX platform's software stack to promote its use in academic research.
引用
收藏
页码:97101 / 97116
页数:16
相关论文
共 50 条
  • [1] Poster: A Testbed Implementation of NDN-based Edge Computing For Mobile Augmented Reality
    Ullah, Rehmat
    Rehman, Muhammad Atif Ur
    Kim, Byung-Seo
    HOTMOBILE '19 - PROCEEDINGS OF THE 20TH INTERNATIONAL WORKSHOP ON MOBILE COMPUTING SYSTEMS AND APPLICATIONS, 2019, : 181 - 181
  • [2] Implementation and Deployment of an Outdoor IoT-based Air Quality Monitoring Testbed
    Tanyingyong, Voravit
    Olsson, Robert
    Hidell, Markus
    Sjodin, Peter
    Ahlgren, Bengt
    2018 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2018,
  • [3] Edge-to-edge measurement-based distributed network monitoring
    Habib, A
    Khan, M
    Bhargava, B
    COMPUTER NETWORKS, 2004, 44 (02) : 211 - 233
  • [4] Study on the link traffic measurement based on edge measurement
    Shang, Fengjun
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES A-MATHEMATICAL ANALYSIS, 2006, 13 : 1275 - 1280
  • [5] Recent trends in the implementation of Intranet based measurement and monitoring
    Ishihara, Y
    Shirota, Y
    IEEE/PES TRANSMISSION AND DISTRIBUTION CONFERENCE AND EXHIBITION 2002: ASIA PACIFIC, VOLS 1-3, CONFERENCE PROCEEDINGS: NEW WAVE OF T&D TECHNOLOGY FROM ASIA PACIFIC, 2002, : 2261 - 2266
  • [6] A new approach to the implementation of intranet-based measurement and monitoring
    Hamamatsu, K
    Watanabe, H
    Sekiguchi, K
    Tsukui, R
    Igarashi, K
    Beaumont, P
    SEVENTH INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN POWER SYSTEM PROTECTION, 2001, (479): : 102 - 105
  • [7] Design and implementation of intelligent monitoring terminal for distribution room based on edge computing
    Liu, Lei
    Chen, Lezhu
    Xu, Sheng
    Xu, Yongjia
    Shi, Chenjun
    Energy Reports, 2021, 7 : 1131 - 1138
  • [8] Design and implementation of intelligent monitoring terminal for distribution room based on edge computing
    Liu, Lei
    Chen, Lezhu
    Xu, Sheng
    Xu, Yongjia
    Shi, Chenjun
    ENERGY REPORTS, 2021, 7 : 1131 - 1138
  • [9] Models of Research Activity Measurement: Web-Based Monitoring Implementation
    Cherednichenko, Olga
    Yanholenko, Olha
    Iakovleva, Olena
    Kustov, Oleksii
    INFORMATION SYSTEMS: EDUCATION, APPLICATIONS, RESEARCH, 2014, 193 : 75 - +
  • [10] Fuzzy-based Driver Monitoring System (FDMS): Implementation of two intelligent FDMSs and a testbed for safe driving in VANETs
    Bylykbashi, Kevin
    Qafzezi, Ermioni
    Ikeda, Makoto
    Matsuo, Keita
    Barolli, Leonard
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 105 : 665 - 674