RecSysOps: Best Practices for Operating a Large-Scale Recommender System

被引:2
|
作者
Saberian, Mohammad [1 ]
Basilico, Justin [1 ]
机构
[1] Netflix Inc, Los Gatos, CA 95032 USA
关键词
Recommender Systems; RecSycOps; error detection; error prediction; model diagnostic; model explainability;
D O I
10.1145/3460231.3474620
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ensuring the health of a modern large-scale recommendation system is a very challenging problem. To address this, we need to put in place proper logging, sophisticated exploration policies, develop ML-interpretability tools or even train new ML models to predict/detect issues of the main production model. In this talk, we shine a light on this less-discussed but important area and share some of the best practices, called RecSysOps, that we've learned while operating our increasingly complex recommender systems at Netflix. RecSysOps is a set of best practices for identifying issues and gaps as well as diagnosing and resolving them in a large-scale machine-learned recommender system. RecSysOps helped us to 1) reduce production issues and 2) increase recommendation quality by identifying areas of improvement and 3) make it possible to bring new innovations faster to our members by enabling us to spend more of our time on new innovations and less on debugging and firefighting issues.
引用
收藏
页码:590 / 591
页数:2
相关论文
共 50 条
  • [41] Leveraging innovization and transfer learning to optimize best management practices in large-scale watershed management
    Deb, Kalyanmoy
    Nejadhashemi, A. Pouyan
    Toscano, Gregorio
    Razavi, Hoda
    Linker, Lewis
    [J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2024, 180
  • [42] Online community photo-sharing in entomology: a large-scale review with suggestions on best practices
    Skvarla, Michael J.
    Fisher, J. Ray
    [J]. ANNALS OF THE ENTOMOLOGICAL SOCIETY OF AMERICA, 2023, 116 (05) : 276 - 304
  • [43] Current best practices and future opportunities for reproducible findings using large-scale neuroimaging in psychiatry
    Jahanshad, Neda
    Lenzini, Petra
    Bijsterbosch, Janine
    [J]. NEUROPSYCHOPHARMACOLOGY, 2024,
  • [44] Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update
    Sima, Chijun
    Fu, Yao
    Sit, Man-Kit
    Guo, Liyi
    Gong, Xuri
    Lin, Feng
    Wu, Junyu
    Li, Yongsheng
    Rong, Haidong
    Aublin, Pierre-Louis
    Mai, Luo
    [J]. PROCEEDINGS OF THE 16TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2022, 2022, : 821 - 839
  • [45] Design of Large-scale Content-based Recommender System using Hadoop MapReduce Framework
    Saravanan, S.
    [J]. 2015 EIGHTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2015, : 302 - 307
  • [46] LARGE-SCALE SYSTEM OPTIMIZATION
    OSTROVSKY, GM
    MIKHAILOVA, YM
    BEREZHINSKY, TA
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 1986, 10 (02) : 123 - 128
  • [47] SOFTWARE AS A LARGE-SCALE SYSTEM
    SAGE, AP
    [J]. LARGE SCALE SYSTEMS IN INFORMATION AND DECISION TECHNOLOGIES, 1987, 12 (03): : 185 - 188
  • [48] An autonomic operating environment for large-scale distributed applications
    Lehman, TJ
    Deen, RG
    Kaufman, JH
    [J]. INTEGRATED COMPUTER-AIDED ENGINEERING, 2006, 13 (01) : 81 - 99
  • [49] The operating efficiency of the large-scale power plant in Klingenberg
    Troger, R
    [J]. ZEITSCHRIFT DES VEREINES DEUTSCHER INGENIEURE, 1927, 71 : 1902 - 1910
  • [50] Multicast and customized deployment of large-scale operating systems
    Lee, Kuen-Min
    Teng, Wei-Guang
    Wu, Jin-Neng
    Huang, Kuo-Ming
    Ko, Yao-Hsing
    Hou, Ting-Wei
    [J]. AUTOMATED SOFTWARE ENGINEERING, 2014, 21 (04) : 443 - 460