Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update

被引:0
|
作者
Sima, Chijun [1 ]
Fu, Yao [2 ]
Sit, Man-Kit [2 ]
Guo, Liyi [1 ]
Gong, Xuri [1 ]
Lin, Feng [1 ]
Wu, Junyu [1 ]
Li, Yongsheng [1 ]
Rong, Haidong [1 ]
Aublin, Pierre-Louis [3 ]
Mai, Luo [2 ]
机构
[1] Tencent, Shenzhen, Peoples R China
[2] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[3] IIJ Res Lab, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Learning Recommender Systems (DLRSs) need to update models at low latency, thus promptly serving new users and content. Existing DLRSs, however, fail to do so. They train/validate models offline and broadcast entire models to global inference clusters. They thus incur significant model update latency (e.g. dozens of minutes), which adversely affects Service-Level Objectives (SLOs). This paper describes Ekko, a novel DLRS that enables low-latency model updates. Its design idea is to allow model updates to be immediately disseminated to all inference clusters, thus bypassing long-latency model checkpoint, validation and broadcast. To realise this idea, we first design an efficient peer-to-peer model update dissemination algorithm. This algorithm exploits the sparsity and temporal locality in updating DLRS models to improve the throughput and latency of updating models. Further, Ekko has a model update scheduler that can prioritise, over busy networks, the sending of model updates that can largely affect SLOs. Finally, Ekko has an inference model state manager which monitors the SLOs of inference models and rollbacks the models if SLO-detrimental biased updates are detected. Evaluation results show that Ekko is orders of magnitude faster than state-of-the-art DLRS systems. Ekko has been deployed in production for more than one year, serves over a billion users daily and reduces the model update latency compared to state-of-the-art systems from dozens of minutes to 2.4 seconds.
引用
收藏
页码:821 / 839
页数:19
相关论文
共 50 条
  • [1] SDM: Sequential Deep Matching Model for Online Large-scale Recommender System
    Lv, Fuyu
    Jin, Taiwei
    Yu, Changlong
    Sun, Fei
    Lin, Quan
    Yang, Keping
    Ng, Wilfred
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2635 - 2643
  • [2] Constructing Large-scale Low-latency Network from Small Optimal Networks
    Mizuno, Ryosuke
    Ishida, Yawara
    2016 TENTH IEEE/ACM INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP (NOCS), 2016,
  • [3] LIVENET: A Low-Latency Video Transport Network for Large-Scale Live Streaming
    Li, Jinyang
    Li, Zhenyu
    Lu, Ri
    Xiao, Kai
    Li, Songlin
    Chen, Jufeng
    Yang, Jingyu
    Zong, Chunli
    Chen, Aiyun
    Wu, Qinghua
    Sun, Chen
    Tyson, Gareth
    Liu, Hongqiang Harry
    SIGCOMM '22: PROCEEDINGS OF THE 2022 ACM SIGCOMM 2022 CONFERENCE, 2022, : 812 - 825
  • [4] Large-scale recommender system with compact latent factor model
    Liu, Chien-Liang
    Wu, Xuan-Wei
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 64 : 467 - 475
  • [5] Deep Learning loss model for large-scale low voltage smart grids
    Velasco, Jose Angel
    Amaris, Hortensia
    Alonso, Monica
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2020, 121
  • [6] Low-latency Job Scheduling with Preemption for the Development of Deep Learning
    Yabuuchi, Hidehito
    Taniwaki, Daisuke
    Omura, Shingo
    PROCEEDINGS OF THE 2019 USENIX CONFERENCE ON OPERATIONAL MACHINE LEARNING, 2019, : 27 - 30
  • [7] Large-scale e-learning recommender system based on Spark and Hadoop
    Dandouh, Karim
    Dakkak, Ahmed
    Oughdir, Lahcen
    Ibriz, Abdelali
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [8] Large-scale e-learning recommender system based on Spark and Hadoop
    Karim Dahdouh
    Ahmed Dakkak
    Lahcen Oughdir
    Abdelali Ibriz
    Journal of Big Data, 6
  • [9] Online Learning in Large-Scale Contextual Recommender Systems
    Song, Linqi
    Tekin, Cem
    van der Schaar, Mihaela
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2016, 9 (03) : 433 - 445
  • [10] Optical RF Tone In-Band Labeling for Large-Scale and Low-Latency Optical Packet Switches
    Luo, Jun
    Dorren, Harm J. S.
    Calabretta, Nicola
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 2012, 30 (16) : 2637 - 2645