Automated Backend Allocation for Multi-Model, On-Device AI Inference

被引:0
|
作者
Iyer V. [1 ]
Lee S. [1 ]
Lee S. [1 ]
Kim J.J. [1 ]
Kim H. [1 ]
Shin Y. [1 ]
机构
[1] Samsung Electronics, Seoul
来源
Performance Evaluation Review | 2024年 / 52卷 / 01期
关键词
control feedback; neural networks; on-device ai; pareto fronts;
D O I
10.1145/3673660.3655046
中图分类号
学科分类号
摘要
On-Device Artificial Intelligence (AI) services such as face recognition, object tracking and voice recognition are rapidly scaling up deployments on embedded, memory-constrained hardware devices. These services typically delegate AI inference models for execution on CPU and GPU computing backends. While GPU delegation is a common practice to achieve high speed computation, the approach suffers from degraded throughput and completion times under multi-model scenarios, i.e., concurrently executing services. This paper introduces a solution to sustain performance in multi-model, on-device AI contexts by dynamically allocating a combination of CPU and GPU backends per model. The allocation is feedback-driven, and guided by a knowledge of model-specific, multi-objective pareto fronts comprising inference latency and memory consumption. Our backend allocation algorithm that runs online per model, and achieves 25-100% improvement in throughput over static allocations as well as load-balancing scheduler solutions targeting multi-model scenarios. © 2024 Owner/Author.
引用
收藏
页码:27 / 28
页数:1
相关论文
共 50 条
  • [1] Automated Backend Allocation for Multi-Model, On-Device AI Inference
    Iyer, Venkatraman
    Lee, Sungho
    Lee, Semun
    Kim, Juitem Joonwoo
    Kim, Hyunjun
    Shin, Youngjae
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2023, 7 (03)
  • [2] AI on the Move: From On-Device to On-Multi-Device
    Flores, Huber
    Nurmi, Petteri
    Hui, Pan
    2019 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS (PERCOM WORKSHOPS), 2019, : 310 - 315
  • [3] Automated Customization of On-Device Inference for Quality-of-Experience Enhancement
    Bai, Yang
    Chen, Lixing
    Ren, Shaolei
    Xu, Jie
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (05) : 1329 - 1342
  • [4] Multi-Model Inference in Biogeography
    Millington, James D. A.
    Perry, George L. W.
    GEOGRAPHY COMPASS, 2011, 5 (07): : 448 - 463
  • [5] Multi-Task Adapters for On-Device Audio Inference
    Tagliasacchi, Marco
    Quitry, Felix de Chaumont
    Roblek, Dominik
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 630 - 634
  • [6] Schema Inference for Multi-Model Data
    Koupil, Pavel
    Hricko, Sebastian
    Holubova, Irena
    PROCEEDINGS OF THE 25TH INTERNATIONAL ACM/IEEE CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, MODELS 2022, 2022, : 13 - 23
  • [7] A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference
    Mazumder, Arnab Neelim
    Meng, Jian
    Rashid, Hasib-Al
    Kallakuri, Utteja
    Zhang, Xin
    Seo, Jae-Sun
    Mohsenin, Tinoosh
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2021, 11 (04) : 532 - 547
  • [8] A Backend-Friendly On-Device Multi-channel Speech Enhancement System with IPD and PHM
    Wen, Wen
    Qian, Jingrui
    Zhang, Yifan
    Xi, Yu
    Jiang, Wenbin
    Zhou, Qiang
    Liu, Beiyi
    Guo, Yao
    Yu, Kai
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 28 - 43
  • [9] Mistify: Automating DNN Model Porting for On-Device Inference at the Edge
    Guo, Peizhen
    Hu, Bo
    Hu, Wenjun
    PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON NETWORKED SYSTEM DESIGN AND IMPLEMENTATION, 2021, : 705 - 720
  • [10] An On-Device Machine Reading Comprehension Model with Adaptive Fast Inference
    Nan, Fulai
    Wang, Jin
    Zhang, Xuejie
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 850 - 862