Multi-Tier GPU Virtualization for Deep Learning in Cloud-Edge Systems

被引:3
|
作者
Kennedy, Jason [1 ]
Sharma, Vishal [1 ]
Varghese, Blesson [2 ]
Reano, Carlos [3 ]
机构
[1] Queens Univ Belfast, Belfast BT7 1NN, North Ireland
[2] Univ St Andrews, St Andrews KY16 9AJ, Scotland
[3] Univ Valencia, Valencia 46022, Spain
关键词
Cloud computing; Containers; Virtualization; Graphics processing units; Deep learning; Data centers; Virtual machine monitors; Accelerators; containers; edge computing; migration; virtualization;
D O I
10.1109/TPDS.2023.3274957
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Accelerator virtualization offers several advantages in the context of cloud-edge computing. Relatively weak user devices can enhance performance when running workloads by accessing virtualized accelerators available on other resources in the cloud-edge continuum. However, cloud-edge systems are heterogeneous, often leading to compatibility issues arising from various hardware and software stacks present in the system. One mechanism to alleviate this issue is using containers for deploying workloads. Containers isolate applications and their dependencies and store them as images that can run on any device. In addition, user devices may move during the course of application execution, and thus mechanisms such as container migration are required to move running workloads from one resource to another in the network. Furthermore, an optimal destination will need to be determined when migrating between virtual accelerators. Scheduling and placement strategies are incorporated to choose the best possible location depending on the workload requirements. This paper presents AVEC, a framework for accelerator virtualization in cloud-edge computing. The AVEC framework enables the offloading of deep learning workloads for inference from weak user devices to computationally more powerful devices in a cloud-edge network. AVEC incorporates a mechanism that efficiently manages and schedules the virtualization of accelerators. It also supports migration between accelerators to enable stateless container migration. The experimental analysis highlights that AVEC can achieve up to 7x speedup by offloading applications to remote resources. Furthermore, AVEC features a low migration downtime that is less than 5 seconds.
引用
收藏
页码:2107 / 2123
页数:17
相关论文
共 50 条
  • [1] AVEC: Accelerator Virtualization in Cloud-Edge Computing for Deep Learning Libraries
    Kennedy, Jason
    Varghese, Blesson
    Reano, Carlos
    [J]. 5TH IEEE INTERNATIONAL CONFERENCE ON FOG AND EDGE COMPUTING (ICFEC 2021), 2021, : 37 - 44
  • [2] Multi-Tier Cellular Handover with Multi-Access Edge Computing and Deep Learning
    Kapadia, Percy
    Seet, Boon-Chong
    [J]. TELECOM, 2021, 2 (04): : 446 - 471
  • [3] Tier-Centric Resource Allocation in Multi-Tier Cloud Systems
    Khasnabish, Jyotiska Nath
    Mithani, Mohammad Firoj
    Rao, Shrisha
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2017, 5 (03) : 576 - 589
  • [4] Improving Resource Allocation in Multi-Tier Cloud Systems
    Mithani, Mohammad Firoj
    Rao, Shrisha
    [J]. 2012 IEEE INTERNATIONAL SYSTEMS CONFERENCE (SYSCON), 2012, : 356 - 361
  • [5] Optimal server and service deployment for multi-tier edge cloud computing
    Ahat, Betul
    Baktir, Ahmet Cihat
    Aras, Necati
    Altinel, I. Kuban
    Ozgovde, Atay
    Ersoy, Cem
    [J]. COMPUTER NETWORKS, 2021, 199
  • [6] An Orchestrator Architecture for Multi-tier Edge/Cloud Video Streaming Services
    Gama, Eduardo S.
    Natesha, B., V
    Immich, Roger
    Bittencourt, Luiz F.
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND COMMUNICATIONS, EDGE, 2023, : 190 - 196
  • [7] Multi-Tier Edge-to-Cloud Architecture for Adaptive Video Delivery
    Immich, Roger
    Villas, Leandro
    Bittencourt, Luiz
    Madeira, Edmundo
    [J]. 2019 7TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD 2019), 2019, : 23 - 30
  • [8] Multi-Agent Deep Reinforcement Learning for Cooperative Offloading in Cloud-Edge Computing
    Suzuki, Akito
    Kobayashi, Masahiro
    [J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 3660 - 3666
  • [9] Energy-Aware and Secure Task Offloading for Multi-Tier Edge-Cloud Computing Systems
    Alharbi, Hatem A.
    Aldossary, Mohammad
    Almutairi, Jaber
    Elgendy, Ibrahim A.
    [J]. SENSORS, 2023, 23 (06)
  • [10] Multi-resource interleaving for task scheduling in cloud-edge system by deep reinforcement learning
    Pei, Xinglong
    Sun, Penghao
    Hu, Yuxiang
    Li, Dan
    Tian, Le
    Li, Ziyong
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 160 : 522 - 536