Automated Backend Allocation for Multi-Model, On-Device AI Inference

被引：0

作者：

Iyer V. ^{[1
]}

Lee S. ^{[1
]}

Kim J.J. ^{[1
]}

Kim H. ^{[1
]}

Shin Y. ^{[1
]}

机构：

[1] Samsung Electronics, Seoul

来源：

Performance Evaluation Review | 2024年 / 52卷 / 01期

关键词：

control feedback; neural networks; on-device ai; pareto fronts;

D O I：

10.1145/3673660.3655046

中图分类号：

学科分类号：

摘要：

On-Device Artificial Intelligence (AI) services such as face recognition, object tracking and voice recognition are rapidly scaling up deployments on embedded, memory-constrained hardware devices. These services typically delegate AI inference models for execution on CPU and GPU computing backends. While GPU delegation is a common practice to achieve high speed computation, the approach suffers from degraded throughput and completion times under multi-model scenarios, i.e., concurrently executing services. This paper introduces a solution to sustain performance in multi-model, on-device AI contexts by dynamically allocating a combination of CPU and GPU backends per model. The allocation is feedback-driven, and guided by a knowledge of model-specific, multi-objective pareto fronts comprising inference latency and memory consumption. Our backend allocation algorithm that runs online per model, and achieves 25-100% improvement in throughput over static allocations as well as load-balancing scheduler solutions targeting multi-model scenarios. © 2024 Owner/Author.

引用

页码：27 / 28

页数：1

共 50 条

[1] Automated Backend Allocation for Multi-Model, On-Device AI Inference
Iyer, Venkatraman
Lee, Sungho
Lee, Semun
Kim, Juitem Joonwoo
Kim, Hyunjun
Shin, Youngjae
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2023, 7 (03)
[2] AI on the Move: From On-Device to On-Multi-Device
Flores, Huber
Nurmi, Petteri
Hui, Pan
2019 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS (PERCOM WORKSHOPS), 2019, : 310 - 315
[3] Automated Customization of On-Device Inference for Quality-of-Experience Enhancement
Bai, Yang
Chen, Lixing
Ren, Shaolei
Xu, Jie
IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (05) : 1329 - 1342
[4] Multi-Model Inference in Biogeography
Millington, James D. A.
Perry, George L. W.
GEOGRAPHY COMPASS, 2011, 5 (07): : 448 - 463
[5] Multi-Task Adapters for On-Device Audio Inference
Tagliasacchi, Marco
Quitry, Felix de Chaumont
Roblek, Dominik
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 630 - 634
[6] Schema Inference for Multi-Model Data
Koupil, Pavel
Hricko, Sebastian
Holubova, Irena
PROCEEDINGS OF THE 25TH INTERNATIONAL ACM/IEEE CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, MODELS 2022, 2022, : 13 - 23
[7] A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference
Mazumder, Arnab Neelim
Meng, Jian
Rashid, Hasib-Al
Kallakuri, Utteja
Zhang, Xin
Seo, Jae-Sun
Mohsenin, Tinoosh
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2021, 11 (04) : 532 - 547
[8] A Backend-Friendly On-Device Multi-channel Speech Enhancement System with IPD and PHM
Wen, Wen
Qian, Jingrui
Zhang, Yifan
Xi, Yu
Jiang, Wenbin
Zhou, Qiang
Liu, Beiyi
Guo, Yao
Yu, Kai
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 28 - 43
[9] Mistify: Automating DNN Model Porting for On-Device Inference at the Edge
Guo, Peizhen
Hu, Bo
Hu, Wenjun
PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON NETWORKED SYSTEM DESIGN AND IMPLEMENTATION, 2021, : 705 - 720
[10] An On-Device Machine Reading Comprehension Model with Adaptive Fast Inference
Nan, Fulai
Wang, Jin
Zhang, Xuejie
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 850 - 862

← 1 2 3 4 5 →