TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function-as-a-Service

被引：14

作者：

Dakkak, Abdul ^{[1
]}

Li, Cheng ^{[1
]}

de Gonzalo, Simon Garcia ^{[1
]}

Xiong, Jinjun ^{[2
]}

Hwu, Wen-mei ^{[3
]}

机构：

[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA

[2] IBM Thomas J Watson Res Ctr, Yorktown Hts, NY USA

[3] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL USA

来源：

2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019) | 2019年

关键词：

D O I：

10.1109/CLOUD.2019.00067

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure, has to be able to handle user-defined FaaS pipelines containing diverse DNN inference workloads while maintaining isolation and latency guarantees with minimal resource waste. The current solution for guaranteeing isolation and latency within FaaS is inefficient. A major cause of the inefficiency is the need to move large amount of data within and across servers. We propose TrIMS as a novel solution to address this issue. TrIMS is a generic memory sharing technique that enables constant data to be shared across processes or containers while still maintaining isolation between users. TrIMS consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of abstracts, application APIs, and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models, up to 210x speedup for large models, and up to 8x system throughput improvement.

引用

页码：372 / 382

页数：11

共 16 条

[1] GPU-enabled Function-as-a-Service for Machine Learning Inference
Zhao, Ming
Jha, Kritshekhar
Hong, Sungho
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 918 - 928
[2] Network Measurements with Function-as-a-Service for Distributed Low-latency Edge Applications
Carlini, Emanuele
Kavalionak, Hanna
Dazzi, Patrizio
Ferrucci, Luca
Coppola, Massimo
Mordacchini, Matteo
2ND WORKSHOP ON FLEXIBLE RESOURCE AND APPLICATION MANAGEMENT ON THE EDGE, FRAME 2022, 2022, : 25 - 28
[3] Low Latency Deep Learning Inference Model for Distributed Intelligent IoT Edge Clusters
Naveen, Soumyalatha
Kounte, Manjunath R.
Ahmed, Mohammed Riyaz
IEEE ACCESS, 2021, 9 : 160607 - 160621
[4] A Deep-Learning Model for Service QoS Prediction Based on Feature Mapping and Inference
Zhang, Peiyun
Ren, Jigang
Huang, Wenjun
Chen, Yutong
Zhao, Qinglin
Zhu, Haibin
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (04) : 1311 - 1325
[5] Deep-Cross-Attention Recommendation Model for Knowledge Sharing Micro Learning Service
Lin, Jiayin
Sun, Geng
Shen, Jun
Pritchard, David
Cui, Tingru
Xu, Dongming
Li, Li
Beydoun, Ghassan
Chen, Shiping
ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2020), PT II, 2020, 12164 : 168 - 173
[6] Low-latency Virtual Network function Scheduling Algorithm Based on Deep Reinforcement Learning
Liu, Zhiwei
Shu, Zhaogang
Chen, Shuwu
Zhong, Yiwen
Lin, Jiaxiang
COMPUTER NETWORKS, 2024, 246
[7] CoFB: latency-constrained co-scheduling of flows and batches for deep learning inference service on the CPU–GPU system
Qi Zhang
Yi Liu
Tao Liu
Depei Qian
The Journal of Supercomputing, 2023, 79 : 14172 - 14199
[8] Latency-Sensitive Service Function Chains Intelligent Migration in Satellite Communication Driven by Deep Reinforcement Learning
Zhang, Peiying
Li, Yilin
Tan, Lizhuang
Liu, Kai
Wen, Miao
Hao, Hao
TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2024, 35 (11):
[9] CoFB: latency-constrained co-scheduling of flows and batches for deep learning inference service on the CPU-GPU system
Zhang, Qi
Liu, Yi
Liu, Tao
Qian, Depei
JOURNAL OF SUPERCOMPUTING, 2023, 79 (13): : 14172 - 14199
[10] Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update
Sima, Chijun
Fu, Yao
Sit, Man-Kit
Guo, Liyi
Gong, Xuri
Lin, Feng
Wu, Junyu
Li, Yongsheng
Rong, Haidong
Aublin, Pierre-Louis
Mai, Luo
PROCEEDINGS OF THE 16TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2022, 2022, : 821 - 839

← 1 2 →