RecSysOps: Best Practices for Operating a Large-Scale Recommender System

被引:2
|
作者
Saberian, Mohammad [1 ]
Basilico, Justin [1 ]
机构
[1] Netflix Inc, Los Gatos, CA 95032 USA
关键词
Recommender Systems; RecSycOps; error detection; error prediction; model diagnostic; model explainability;
D O I
10.1145/3460231.3474620
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ensuring the health of a modern large-scale recommendation system is a very challenging problem. To address this, we need to put in place proper logging, sophisticated exploration policies, develop ML-interpretability tools or even train new ML models to predict/detect issues of the main production model. In this talk, we shine a light on this less-discussed but important area and share some of the best practices, called RecSysOps, that we've learned while operating our increasingly complex recommender systems at Netflix. RecSysOps is a set of best practices for identifying issues and gaps as well as diagnosing and resolving them in a large-scale machine-learned recommender system. RecSysOps helped us to 1) reduce production issues and 2) increase recommendation quality by identifying areas of improvement and 3) make it possible to bring new innovations faster to our members by enabling us to spend more of our time on new innovations and less on debugging and firefighting issues.
引用
收藏
页码:590 / 591
页数:2
相关论文
共 50 条
  • [21] LSRS'16: Workshop on Large-Scale Recommender Systems
    Ye, Tao
    Bickson, Danny
    Parra, Denis
    [J]. PROCEEDINGS OF THE 10TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'16), 2016, : 421 - 422
  • [22] Continuous Hyperparameter Optimization for Large-scale Recommender Systems
    Chan, Simon
    Treleaven, Philip
    Capra, Licia
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [23] Modeling Impression Discounting in Large-scale Recommender Systems
    Lee, Pei
    Lakshmanan, Laks V. S.
    Tiwari, Mitul
    Shah, Sam
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1837 - 1846
  • [24] Online Learning in Large-Scale Contextual Recommender Systems
    Song, Linqi
    Tekin, Cem
    van der Schaar, Mihaela
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2016, 9 (03) : 433 - 445
  • [25] LSRS'17: Workshop on Large-Scale Recommender Systems
    Ye, Tao
    Parra, Denis
    Ostuni, Vito
    Wang, Tao
    [J]. PROCEEDINGS OF THE ELEVENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'17), 2017, : 390 - 391
  • [26] DTUMOS, digital twin for large-scale urban mobility operating system
    Hyeokju Yeon
    Taebum Eom
    Kitae Jang
    Jiho Yeo
    [J]. Scientific Reports, 13
  • [27] DTUMOS, digital twin for large-scale urban mobility operating system
    Yeon, Hyeokju
    Eom, Taebum
    Jang, Kitae
    Yeo, Jiho
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [28] SECURITY AND VO MANAGEMENT CAPABILITIES IN A LARGE-SCALE GRID OPERATING SYSTEM
    Aziz, Benjamin
    Sporea, Ioana
    [J]. COMPUTING AND INFORMATICS, 2014, 33 (02) : 303 - 326
  • [29] Best practices for analyzing large-scale health data from wearables and smartphone apps
    Hicks, Jennifer L.
    Althoff, Tim
    Sosic, Rok
    Kuhar, Peter
    Bostjancic, Bojan
    King, Abby C.
    Leskovec, Jure
    Delp, Scott L.
    [J]. NPJ DIGITAL MEDICINE, 2019, 2 (1)
  • [30] Performance evaluation of a large-scale thermal power plant based on the best industrial practices
    Najjar, Yousef S. H.
    Abu-Shamleh, Amer
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)