Automated Backend Allocation for Multi-Model, On-Device AI Inference

被引:0
|
作者
Iyer V. [1 ]
Lee S. [1 ]
Lee S. [1 ]
Kim J.J. [1 ]
Kim H. [1 ]
Shin Y. [1 ]
机构
[1] Samsung Electronics, Seoul
来源
Performance Evaluation Review | 2024年 / 52卷 / 01期
关键词
control feedback; neural networks; on-device ai; pareto fronts;
D O I
10.1145/3673660.3655046
中图分类号
学科分类号
摘要
On-Device Artificial Intelligence (AI) services such as face recognition, object tracking and voice recognition are rapidly scaling up deployments on embedded, memory-constrained hardware devices. These services typically delegate AI inference models for execution on CPU and GPU computing backends. While GPU delegation is a common practice to achieve high speed computation, the approach suffers from degraded throughput and completion times under multi-model scenarios, i.e., concurrently executing services. This paper introduces a solution to sustain performance in multi-model, on-device AI contexts by dynamically allocating a combination of CPU and GPU backends per model. The allocation is feedback-driven, and guided by a knowledge of model-specific, multi-objective pareto fronts comprising inference latency and memory consumption. Our backend allocation algorithm that runs online per model, and achieves 25-100% improvement in throughput over static allocations as well as load-balancing scheduler solutions targeting multi-model scenarios. © 2024 Owner/Author.
引用
收藏
页码:27 / 28
页数:1
相关论文
共 50 条
  • [31] Multi-model and network inference based on ensemble estimates: avoiding the madness of crowds
    Stumpf, Michael P. H.
    JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2020, 17 (171)
  • [33] Denoising OCT Images Using Steered Mixture of Experts with Multi-Model Inference
    Ozkan, Aytac
    Stoykova, Elena
    Sikora, Thomas
    Madjarova, Violeta
    OPTICAL COHERENCE TOMOGRAPHY AND COHERENCE DOMAIN OPTICAL METHODS IN BIOMEDICINE XXVIII, 2024, 12830
  • [34] POS: An Operator Scheduling Framework for Multi-model Inference on Edge Intelligent Computing
    Zhang, Ziyang
    Li, Huan
    Zhao, Yang
    Lin, Changyao
    Liu, Jie
    PROCEEDINGS OF THE 2023 THE 22ND INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING IN SENSOR NETWORKS, IPSN 2023, 2023, : 40 - 52
  • [35] MLink: Linking Black-Box Models for Collaborative Multi-Model Inference
    Yuan, Mu
    Zhang, Lan
    Li, Xiang-Yang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9475 - 9483
  • [36] Approximating inference on complex motion models using multi-model particle filter
    Wang, JY
    Zhao, DB
    Shan, SG
    Gao, W
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2004, PT 2, PROCEEDINGS, 2004, 3332 : 1011 - 1018
  • [37] Multi-model polynomial chaos surrogate dictionary for Bayesian inference in elasticity problems
    Contreras, Andres A.
    Le Maitre, Olivier P.
    Aquino, Wilkins
    Knio, Omar M.
    PROBABILISTIC ENGINEERING MECHANICS, 2016, 46 : 107 - 119
  • [38] Distributed interactive multi-model estimatation based on partial variational Bayesian inference
    Hu Z.-T.
    Yang S.-B.
    Hou W.
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2024, 41 (04): : 681 - 690
  • [39] Towards On-device Deep Neural Network Inference and Model Update for Real-time Gesture Classification
    Dere, Mustapha Deji
    Ji-hun, Jo
    Lee, Boreom
    2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 337 - 341
  • [40] Efficient inference and learning of a generative model for ENSO predictions from large multi-model datasets
    Groth, Andreas
    Chavez, Erik
    CLIMATE DYNAMICS, 2024, 62 (06) : 5259 - 5282