GPU-enabled Function-as-a-Service for Machine Learning Inference

被引:3
|
作者
Zhao, Ming [1 ]
Jha, Kritshekhar [1 ]
Hong, Sungho [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85287 USA
基金
美国国家科学基金会;
关键词
Function-as-a-Service; GPU scheduling; Caching; Machine learning inference;
D O I
10.1109/IPDPS54959.2023.00096
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Function-as-a-Service (FaaS) is emerging as an important cloud computing service model as it can improve the scalability and usability of a wide range of applications, especially Machine-Learning (ML) inference tasks that require scalable resources and complex software configurations. These inference tasks heavily rely on GPUs to achieve high performance; however, support for GPUs is currently lacking in the existing FaaS solutions. The unique event-triggered and short-lived nature of functions poses new challenges to enabling GPUs on FaaS, which must consider the overhead of transferring data (e.g., ML model parameters and inputs/outputs) between GPU and host memory. This paper proposes a novel GPU-enabled FaaS solution that enables ML inference functions to efficiently utilize GPUs to accelerate their computations. First, it extends existing FaaS frameworks such as OpenFaaS to support the scheduling and execution of functions across GPUs in a FaaS cluster. Second, it provides caching of ML models in GPU memory to improve the performance of model inference functions and global management of GPU memories to improve cache utilization. Third, it offers co-designed GPU function scheduling and cache management to optimize the performance of ML inference functions. Specifically, the paper proposes locality-aware scheduling, which maximizes the utilization of both GPU memory for cache hits and GPU cores for parallel processing. A thorough evaluation based on real-world traces and ML models shows that the proposed GPU-enabled FaaS works well for ML inference tasks, and the proposed locality-aware scheduler achieves a speedup of 48x compared to the default, load balancing only schedulers.
引用
收藏
页码:918 / 928
页数:11
相关论文
共 50 条
  • [1] GPU-Enabled AI
    不详
    IEEE INTELLIGENT SYSTEMS, 2009, 24 (04) : 5 - 8
  • [2] mlGeNN: accelerating SNN inference using GPU-enabled neural networks
    Turner, James Paul
    Knight, James C.
    Subramanian, Ajay
    Nowotny, Thomas
    NEUROMORPHIC COMPUTING AND ENGINEERING, 2022, 2 (02):
  • [3] GPU architecture and applications of GPU-enabled computing
    Poole, Duncan
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2010, 240
  • [4] Infrastructure-level Support for GPU-Enabled Deep Learning in DATAVIEW
    Liu, Junwen
    Xiao, Ziyun
    Lu, Shiyong
    Che, Dunren
    Dong, Ming
    Bai, Changxin
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 141 : 723 - 737
  • [5] TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function-as-a-Service
    Dakkak, Abdul
    Li, Cheng
    de Gonzalo, Simon Garcia
    Xiong, Jinjun
    Hwu, Wen-mei
    2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019), 2019, : 372 - 382
  • [6] GPU-Enabled Macromolecular Simulation: Challenges and Opportunities
    Taufer, Michela
    Ganesan, Narayan
    Patel, Sandeep
    COMPUTING IN SCIENCE & ENGINEERING, 2013, 15 (01) : 56 - 65
  • [7] GPU-Accelerated Machine Learning Inference as a Service for Computing in Neutrino Experiments
    Wang, Michael
    Yang, Tingjun
    Flechas, Maria Acosta
    Harris, Philip
    Hawks, Benjamin
    Holzman, Burt
    Knoepfel, Kyle
    Krupa, Jeffrey
    Pedro, Kevin
    Tran, Nhan
    FRONTIERS IN BIG DATA, 2021, 3
  • [8] Rapid CT reconstruction on GPU-enabled HPC clusters
    Thompson, D.
    Nesterets, Ya. I.
    Gureyev, T. E.
    Sakellariou, A.
    Khassapov, A.
    Taylor, J. A.
    19TH INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION (MODSIM2011), 2011, : 620 - 626
  • [9] A GPU-enabled Level Set Method for Mask Optimization
    Yu, Ziyang
    Chen, Guojin
    Ma, Yuzhe
    Yu, Bei
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1835 - 1838
  • [10] On-Site Volume Rendering with GPU-Enabled Devices
    Movania, Muhammad Mobeen
    Chiew, Wei Ming
    Lin, Feng
    WIRELESS PERSONAL COMMUNICATIONS, 2014, 76 (04) : 795 - 812