Automated Backend Allocation for Multi-Model, On-Device AI Inference

被引：0

作者：

Iyer V. ^{[1
]}

Lee S. ^{[1
]}

Kim J.J. ^{[1
]}

Kim H. ^{[1
]}

Shin Y. ^{[1
]}

机构：

[1] Samsung Electronics, Seoul

来源：

Performance Evaluation Review | 2024年 / 52卷 / 01期

关键词：

control feedback; neural networks; on-device ai; pareto fronts;

D O I：

10.1145/3673660.3655046

中图分类号：

学科分类号：

摘要：

On-Device Artificial Intelligence (AI) services such as face recognition, object tracking and voice recognition are rapidly scaling up deployments on embedded, memory-constrained hardware devices. These services typically delegate AI inference models for execution on CPU and GPU computing backends. While GPU delegation is a common practice to achieve high speed computation, the approach suffers from degraded throughput and completion times under multi-model scenarios, i.e., concurrently executing services. This paper introduces a solution to sustain performance in multi-model, on-device AI contexts by dynamically allocating a combination of CPU and GPU backends per model. The allocation is feedback-driven, and guided by a knowledge of model-specific, multi-objective pareto fronts comprising inference latency and memory consumption. Our backend allocation algorithm that runs online per model, and achieves 25-100% improvement in throughput over static allocations as well as load-balancing scheduler solutions targeting multi-model scenarios. © 2024 Owner/Author.

引用

页码：27 / 28

页数：1

共 50 条

[31] Multi-model and network inference based on ensemble estimates: avoiding the madness of crowds
Stumpf, Michael P. H.
JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2020, 17 (171)
[32] Approximating inference on complex motion models using multi-model particle filter
(Springer Verlag):
[33] Denoising OCT Images Using Steered Mixture of Experts with Multi-Model Inference
Ozkan, Aytac
Stoykova, Elena
Sikora, Thomas
Madjarova, Violeta
OPTICAL COHERENCE TOMOGRAPHY AND COHERENCE DOMAIN OPTICAL METHODS IN BIOMEDICINE XXVIII, 2024, 12830
[34] POS: An Operator Scheduling Framework for Multi-model Inference on Edge Intelligent Computing
Zhang, Ziyang
Li, Huan
Zhao, Yang
Lin, Changyao
Liu, Jie
PROCEEDINGS OF THE 2023 THE 22ND INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING IN SENSOR NETWORKS, IPSN 2023, 2023, : 40 - 52
[35] MLink: Linking Black-Box Models for Collaborative Multi-Model Inference
Yuan, Mu
Zhang, Lan
Li, Xiang-Yang
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9475 - 9483
[36] Approximating inference on complex motion models using multi-model particle filter
Wang, JY
Zhao, DB
Shan, SG
Gao, W
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2004, PT 2, PROCEEDINGS, 2004, 3332 : 1011 - 1018
[37] Multi-model polynomial chaos surrogate dictionary for Bayesian inference in elasticity problems
Contreras, Andres A.
Le Maitre, Olivier P.
Aquino, Wilkins
Knio, Omar M.
PROBABILISTIC ENGINEERING MECHANICS, 2016, 46 : 107 - 119
[38] Distributed interactive multi-model estimatation based on partial variational Bayesian inference
Hu Z.-T.
Yang S.-B.
Hou W.
Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2024, 41 (04): : 681 - 690
[39] Towards On-device Deep Neural Network Inference and Model Update for Real-time Gesture Classification
Dere, Mustapha Deji
Ji-hun, Jo
Lee, Boreom
2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 337 - 341
[40] Efficient inference and learning of a generative model for ENSO predictions from large multi-model datasets
Groth, Andreas
Chavez, Erik
CLIMATE DYNAMICS, 2024, 62 (06) : 5259 - 5282

← 1 2 3 4 5 →