RecSysOps: Best Practices for Operating a Large-Scale Recommender System

被引：2

作者：

Saberian, Mohammad ^{[1
]}

Basilico, Justin ^{[1
]}

机构：

[1] Netflix Inc, Los Gatos, CA 95032 USA

来源：

15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021) | 2021年

关键词：

Recommender Systems; RecSycOps; error detection; error prediction; model diagnostic; model explainability;

D O I：

10.1145/3460231.3474620

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Ensuring the health of a modern large-scale recommendation system is a very challenging problem. To address this, we need to put in place proper logging, sophisticated exploration policies, develop ML-interpretability tools or even train new ML models to predict/detect issues of the main production model. In this talk, we shine a light on this less-discussed but important area and share some of the best practices, called RecSysOps, that we've learned while operating our increasingly complex recommender systems at Netflix. RecSysOps is a set of best practices for identifying issues and gaps as well as diagnosing and resolving them in a large-scale machine-learned recommender system. RecSysOps helped us to 1) reduce production issues and 2) increase recommendation quality by identifying areas of improvement and 3) make it possible to bring new innovations faster to our members by enabling us to spend more of our time on new innovations and less on debugging and firefighting issues.

引用

页码：590 / 591

页数：2

共 50 条

[31] Performance evaluation of a large-scale thermal power plant based on the best industrial practices
Najjar, Yousef S. H.
Abu-Shamleh, Amer
[J]. SCIENTIFIC REPORTS, 2020, 10 (01)
[32] Large-Scale Performance and Design for Construction Activity Erosion Control Best Management Practices
Faucette, L. B.
Scholl, B.
Beighley, R. E.
Governo, J.
[J]. JOURNAL OF ENVIRONMENTAL QUALITY, 2009, 38 (03) : 1248 - 1254
[33] Best practices for analyzing large-scale health data from wearables and smartphone apps
Jennifer L. Hicks
Tim Althoff
Rok Sosic
Peter Kuhar
Bojan Bostjancic
Abby C. King
Jure Leskovec
Scott L. Delp
[J]. npj Digital Medicine, 2
[34] Performance evaluation of a large-scale thermal power plant based on the best industrial practices
Yousef S. H. Najjar
Amer Abu-Shamleh
[J]. Scientific Reports, 10
[35] Requirements engineering challenges and practices in large-scale agile system development
Kasauli, Rashidah
Knauss, Eric
Horkoff, Jennifer
Liebel, Grischa
de Oliveira Neto, Francisco Gomes
[J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2021, 172
[36] Distributed Equivalent Substitution Training for Large-Scale Recommender Systems
Rong, Haidong
Wang, Yangzihao
Zhou, Feihu
Zhai, Junjie
Wu, Haiyang
Lan, Rui
Li, Fan
Zhang, Han
Yang, Yuekui
Guo, Zhenyu
Wang, Di
[J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 911 - 920
[37] Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems
Yuan, Guanghu
Yuan, Fajie
Li, Yudong
Kong, Beibei
Li, Shujie
Chen, Lei
Yang, Min
Yu, Chenyun
Hu, Bo
Li, Zang
Xu, Yu
Qie, Xiaohu
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[38] PIE: Personalized Interest Exploration for Large-Scale Recommender Systems
Mahajan, Khushhall Chandra
Dharwadker, Amey Porobo
Shah, Romil
Qu, Simeng
Bang, Gaurav
Schumitsch, Brad
[J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 508 - 512
[39] Improved Bounded Matrix Completion for Large-Scale Recommender Systems
Fang, Huang
Zhen, Zhang
Shao, Yiqun
Hsieh, Cho-Jui
[J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1654 - 1660
[40] On Some Best Practices in Large-Scale Ontology Development: The Chronious Ontology Suite as a Case Study
Schneider, Luc
Brochhausen, Mathias
Koepsell, David
[J]. FORMAL ONTOLOGIES MEET INDUSTRY: PROCEEDINGS OF THE FIFTH INTERNATIONAL WORKSHOP (FOMI 2011), 2011, 229 : 28 - 38

← 1 2 3 4 5 →