Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update

被引：0

作者：

Sima, Chijun ^{[1
]}

Fu, Yao ^{[2
]}

Sit, Man-Kit ^{[2
]}

Guo, Liyi ^{[1
]}

Gong, Xuri ^{[1
]}

Lin, Feng ^{[1
]}

Wu, Junyu ^{[1
]}

Li, Yongsheng ^{[1
]}

Rong, Haidong ^{[1
]}

Aublin, Pierre-Louis ^{[3
]}

Mai, Luo ^{[2
]}

机构：

[1] Tencent, Shenzhen, Peoples R China

[2] Univ Edinburgh, Edinburgh, Midlothian, Scotland

[3] IIJ Res Lab, Tokyo, Japan

来源：

PROCEEDINGS OF THE 16TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2022 | 2022年

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Learning Recommender Systems (DLRSs) need to update models at low latency, thus promptly serving new users and content. Existing DLRSs, however, fail to do so. They train/validate models offline and broadcast entire models to global inference clusters. They thus incur significant model update latency (e.g. dozens of minutes), which adversely affects Service-Level Objectives (SLOs). This paper describes Ekko, a novel DLRS that enables low-latency model updates. Its design idea is to allow model updates to be immediately disseminated to all inference clusters, thus bypassing long-latency model checkpoint, validation and broadcast. To realise this idea, we first design an efficient peer-to-peer model update dissemination algorithm. This algorithm exploits the sparsity and temporal locality in updating DLRS models to improve the throughput and latency of updating models. Further, Ekko has a model update scheduler that can prioritise, over busy networks, the sending of model updates that can largely affect SLOs. Finally, Ekko has an inference model state manager which monitors the SLOs of inference models and rollbacks the models if SLO-detrimental biased updates are detected. Evaluation results show that Ekko is orders of magnitude faster than state-of-the-art DLRS systems. Ekko has been deployed in production for more than one year, serves over a billion users daily and reduces the model update latency compared to state-of-the-art systems from dozens of minutes to 2.4 seconds.

引用

页码：821 / 839

页数：19

共 50 条

[41] NaijaFaceVoice: A Large-Scale Deep Learning Model and Database of Nigerian Faces and Voices
Akinrinmade, Adekunle Anthony
Adetiba, Emmanuel
Badejo, Joke A.
Oshin, Oluwadamilola
IEEE ACCESS, 2023, 11 : 58228 - 58243
[42] Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
Choi, Hyeonseong
Lee, Jaehwan
APPLIED SCIENCES-BASEL, 2021, 11 (21):
[43] A Novel Pruning Model of Deep Learning for Large-Scale Distributed Data Processing
Sheng, Yiqiang
Li, Chaopeng
Wang, Jinlin
Deng, Haojiang
Zhao, Zhenyu
2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 314 - 319
[44] Edge Enhanced Deep Learning System for Large-scale Video Stream Analytics
Ali, M.
Anjum, A.
Yaseen, M. U.
Zamani, A. R.
Balouek-Thomert, D.
Rana, O.
Parashar, M.
2018 IEEE 2ND INTERNATIONAL CONFERENCE ON FOG AND EDGE COMPUTING (ICFEC), 2018,
[45] SHARK: A Lightweight Model Compression Approach for Large-scale Recommender Systems
Zhang, Beichuan
Sun, Chenggen
Tan, Jianchao
Cai, Xinjun
Zhao, Jun
Miao, Mengqi
Yin, Kang
Song, Chengru
Mou, Na
Song, Yang
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4930 - 4937
[46] Hybrid semantic recommender system for chemical compounds in large-scale datasets
Barros, Marcia
Moitinho, Andre
Couto, Francisco M.
JOURNAL OF CHEMINFORMATICS, 2021, 13 (01)
[47] Practical Lessons from Developing a Large-Scale Recommender System at Zalando
Freno, Antonino
PROCEEDINGS OF THE ELEVENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'17), 2017, : 251 - 259
[48] Low-Latency Federated Learning via Dynamic Model Partitioning for Healthcare IoT
He, Peng
Lan, Chunhui
Bashir, Ali Kashif
Wu, Dapeng
Wang, Ruyan
Kharel, Rupak
Yu, Keping
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (10) : 4684 - 4695
[49] Hybrid semantic recommender system for chemical compounds in large-scale datasets
Marcia Barros
Andre Moitinho
Francisco M. Couto
Journal of Cheminformatics, 13
[50] Sequential Learning over Implicit Feedback for Robust Large-Scale Recommender Systems
Burashnikova, Aleksandra
Maximov, Yury
Amini, Massih-Reza
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT III, 2020, 11908 : 253 - 269

← 1 2 3 4 5 →