Automated Backend Allocation for Multi-Model, On-Device AI Inference

被引:0
|
作者
Iyer V. [1 ]
Lee S. [1 ]
Lee S. [1 ]
Kim J.J. [1 ]
Kim H. [1 ]
Shin Y. [1 ]
机构
[1] Samsung Electronics, Seoul
来源
Performance Evaluation Review | 2024年 / 52卷 / 01期
关键词
control feedback; neural networks; on-device ai; pareto fronts;
D O I
10.1145/3673660.3655046
中图分类号
学科分类号
摘要
On-Device Artificial Intelligence (AI) services such as face recognition, object tracking and voice recognition are rapidly scaling up deployments on embedded, memory-constrained hardware devices. These services typically delegate AI inference models for execution on CPU and GPU computing backends. While GPU delegation is a common practice to achieve high speed computation, the approach suffers from degraded throughput and completion times under multi-model scenarios, i.e., concurrently executing services. This paper introduces a solution to sustain performance in multi-model, on-device AI contexts by dynamically allocating a combination of CPU and GPU backends per model. The allocation is feedback-driven, and guided by a knowledge of model-specific, multi-objective pareto fronts comprising inference latency and memory consumption. Our backend allocation algorithm that runs online per model, and achieves 25-100% improvement in throughput over static allocations as well as load-balancing scheduler solutions targeting multi-model scenarios. © 2024 Owner/Author.
引用
收藏
页码:27 / 28
页数:1
相关论文
共 50 条
  • [41] WattWiser: Power & Resource-Efficient Scheduling for Multi-Model Multi-GPU Inference Servers
    Jahanshahi, Ali
    Rezvani, Mohammadreza
    Wong, Daniel
    2023 THE 14TH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE, IGSC-2023, 2023, : 39 - 44
  • [42] Modeling growth on the cannonball jellyfish Stomolophus meleagris based on a multi-model inference approach
    Juana López-Martínez
    Edgar Arnoldo Arzola-Sotelo
    Manuel Otilio Nevárez-Martínez
    F. Javier Álvarez-Tello
    Enrique Morales-Bojórquez
    Hydrobiologia, 2020, 847 : 1399 - 1422
  • [43] Multi-model inference of non-random mating from an information theoretic approach
    Carvajal-Rodriguez, A.
    THEORETICAL POPULATION BIOLOGY, 2020, 131 : 38 - 53
  • [44] Modeling growth on the cannonball jellyfish Stomolophus meleagris based on a multi-model inference approach
    Lopez-Martinez, Juana
    Arzola-Sotelo, Edgar Arnoldo
    Nevarez-Martinez, Manuel Otilio
    Alvarez-Tello, F. Javier
    Morales-Bojorquez, Enrique
    HYDROBIOLOGIA, 2020, 847 (06) : 1399 - 1422
  • [45] Modeling round goby growth in Lake Michigan and Lake Huron with multi-model inference
    Duan, Youjian
    Madenjian, Charles P.
    Zhao, Yingming
    Huo, Bin
    FISHERIES RESEARCH, 2021, 236
  • [46] Sampling-Based SAT/ASP Multi-model Optimization as a Framework for Probabilistic Inference
    Nickles, Matthias
    INDUCTIVE LOGIC PROGRAMMING (ILP 2018), 2018, 11105 : 88 - 104
  • [47] Multi-model inference in comparative phylogeography: an integrative approach based on multiple lines of evidence
    Collevatti, Rosane G.
    Terribile, Levi C.
    Diniz-Filho, Jose A. F.
    Lima-Ribeiro, Matheus S.
    FRONTIERS IN GENETICS, 2015, 6
  • [48] An robust N-gram causal inference approach based on multi-model fusion
    Guo, Junjie
    Zhang, Yunfei
    Xu, Quan
    Jiang, Lei
    Liu, Xiaolong
    Lv, Shumin
    Zhu, Junpeng
    PHYSICAL COMMUNICATION, 2024, 64
  • [49] Spatial scaling and multi-model inference in landscape genetics: Martes americana in northern Idaho
    Wasserman, Tzeidle N.
    Cushman, Samuel A.
    Schwartz, Michael K.
    Wallin, David O.
    LANDSCAPE ECOLOGY, 2010, 25 (10) : 1601 - 1612
  • [50] Spatial scaling and multi-model inference in landscape genetics: Martes americana in northern Idaho
    Tzeidle N. Wasserman
    Samuel A. Cushman
    Michael K. Schwartz
    David O. Wallin
    Landscape Ecology, 2010, 25 : 1601 - 1612