Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update

被引:0
|
作者
Sima, Chijun [1 ]
Fu, Yao [2 ]
Sit, Man-Kit [2 ]
Guo, Liyi [1 ]
Gong, Xuri [1 ]
Lin, Feng [1 ]
Wu, Junyu [1 ]
Li, Yongsheng [1 ]
Rong, Haidong [1 ]
Aublin, Pierre-Louis [3 ]
Mai, Luo [2 ]
机构
[1] Tencent, Shenzhen, Peoples R China
[2] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[3] IIJ Res Lab, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Learning Recommender Systems (DLRSs) need to update models at low latency, thus promptly serving new users and content. Existing DLRSs, however, fail to do so. They train/validate models offline and broadcast entire models to global inference clusters. They thus incur significant model update latency (e.g. dozens of minutes), which adversely affects Service-Level Objectives (SLOs). This paper describes Ekko, a novel DLRS that enables low-latency model updates. Its design idea is to allow model updates to be immediately disseminated to all inference clusters, thus bypassing long-latency model checkpoint, validation and broadcast. To realise this idea, we first design an efficient peer-to-peer model update dissemination algorithm. This algorithm exploits the sparsity and temporal locality in updating DLRS models to improve the throughput and latency of updating models. Further, Ekko has a model update scheduler that can prioritise, over busy networks, the sending of model updates that can largely affect SLOs. Finally, Ekko has an inference model state manager which monitors the SLOs of inference models and rollbacks the models if SLO-detrimental biased updates are detected. Evaluation results show that Ekko is orders of magnitude faster than state-of-the-art DLRS systems. Ekko has been deployed in production for more than one year, serves over a billion users daily and reduces the model update latency compared to state-of-the-art systems from dozens of minutes to 2.4 seconds.
引用
收藏
页码:821 / 839
页数:19
相关论文
共 50 条
  • [41] NaijaFaceVoice: A Large-Scale Deep Learning Model and Database of Nigerian Faces and Voices
    Akinrinmade, Adekunle Anthony
    Adetiba, Emmanuel
    Badejo, Joke A.
    Oshin, Oluwadamilola
    IEEE ACCESS, 2023, 11 : 58228 - 58243
  • [42] Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
    Choi, Hyeonseong
    Lee, Jaehwan
    APPLIED SCIENCES-BASEL, 2021, 11 (21):
  • [43] A Novel Pruning Model of Deep Learning for Large-Scale Distributed Data Processing
    Sheng, Yiqiang
    Li, Chaopeng
    Wang, Jinlin
    Deng, Haojiang
    Zhao, Zhenyu
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 314 - 319
  • [44] Edge Enhanced Deep Learning System for Large-scale Video Stream Analytics
    Ali, M.
    Anjum, A.
    Yaseen, M. U.
    Zamani, A. R.
    Balouek-Thomert, D.
    Rana, O.
    Parashar, M.
    2018 IEEE 2ND INTERNATIONAL CONFERENCE ON FOG AND EDGE COMPUTING (ICFEC), 2018,
  • [45] SHARK: A Lightweight Model Compression Approach for Large-scale Recommender Systems
    Zhang, Beichuan
    Sun, Chenggen
    Tan, Jianchao
    Cai, Xinjun
    Zhao, Jun
    Miao, Mengqi
    Yin, Kang
    Song, Chengru
    Mou, Na
    Song, Yang
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4930 - 4937
  • [46] Hybrid semantic recommender system for chemical compounds in large-scale datasets
    Barros, Marcia
    Moitinho, Andre
    Couto, Francisco M.
    JOURNAL OF CHEMINFORMATICS, 2021, 13 (01)
  • [47] Practical Lessons from Developing a Large-Scale Recommender System at Zalando
    Freno, Antonino
    PROCEEDINGS OF THE ELEVENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'17), 2017, : 251 - 259
  • [48] Low-Latency Federated Learning via Dynamic Model Partitioning for Healthcare IoT
    He, Peng
    Lan, Chunhui
    Bashir, Ali Kashif
    Wu, Dapeng
    Wang, Ruyan
    Kharel, Rupak
    Yu, Keping
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (10) : 4684 - 4695
  • [49] Hybrid semantic recommender system for chemical compounds in large-scale datasets
    Marcia Barros
    Andre Moitinho
    Francisco M. Couto
    Journal of Cheminformatics, 13
  • [50] Sequential Learning over Implicit Feedback for Robust Large-Scale Recommender Systems
    Burashnikova, Aleksandra
    Maximov, Yury
    Amini, Massih-Reza
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT III, 2020, 11908 : 253 - 269