InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models

被引：1

作者：

Nagrecha, Kabir ^{[1
]}

Liu, Lingyi ^{[1
]}

Delgado, Pablo ^{[1
]}

Padmanabhan, Prasanna ^{[1
]}

机构：

[1] Netflix Inc, Los Gatos, CA 95032 USA

来源：

PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023 | 2023年

关键词：

data processing; recommendation systems; deep learning; parallel computing; resource allocation;

D O I：

10.1145/3604915.3608778

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning-based recommender models (DLRMs) have become an essential component of many modern recommender systems. Several companies are nowbuilding large compute clusters reserved only for DLRM training, driving new interest in cost- & time- saving optimizations. The systems challenges faced in this setting are unique; while typical deep learning (DL) training jobs are dominated by model execution times, the most important factor in DLRM training performance is often online data ingestion. In this paper, we explore the unique characteristics of this data ingestion problem and provide insights into the specific bottlenecks and challenges of the DLRM training pipeline at scale. We study real-world DLRM data processing pipelines taken from our compute cluster at Netflix to both observe the performance impacts of online ingestion and to identify shortfalls in existing data pipeline optimizers. We find that current tooling either yields sub-optimal performance, frequent crashes, or else requires impractical cluster re-organization to adopt. Our studies lead us to design and build a new solution for data pipeline optimization, InTune. InTune employs a reinforcement learning (RL) agent to learn how to distribute the CPU resources of a trainer machine across a DLRM data pipeline to more effectively parallelize data-loading and improve throughput. Our experiments show that InTune can build an optimized data pipeline configuration within only a few minutes, and can easily be integrated into existing training workflows. By exploiting the responsiveness and adaptability of RL, InTune achieves significantly higher online data ingestion rates than existing optimizers, thus reducing idle times in model execution and increasing efficiency. We apply InTune to our real-world cluster, and find that it increases data ingestion throughput by as much as 2.29X versus current state-of-the-art data pipeline optimizers while also improving both CPU & GPU utilization.

引用

页码：430 / 442

页数：13

共 50 条

[1] Deep Reinforcement Learning-Based Routing Optimization Algorithm for Edge Data Center
Zhao, Jixin
Zhang, Shukui
Zhang, Yang
Zhang, Li
Long, Hao
[J]. 26TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2021), 2021,
[2] A Deep Reinforcement Learning-Based Geographic Packet Routing Optimization
Bai, Yijie
Zhang, Xia
Yu, Daojie
Li, Shengxiang
Wang, Yu
Lei, Shuntian
Tian, Zhoutai
[J]. IEEE ACCESS, 2022, 10 : 108785 - 108796
[3] A Deep Learning-Based Pipeline for the Generation of Synthetic Tabular Data
Panfilo, Daniele
Boudewijn, Alexander
Saccani, Sebastiano
Coser, Andrea
Svara, Borut
Chauvenet, Carlo Rossi
Mami, Ciro Antonio
Medvet, Eric
[J]. IEEE ACCESS, 2023, 11 : 63306 - 63323
[4] Reinforcement Learning-Based News Recommendation System
Aboutorab, Hamed
Hussain, Omar K.
Saberi, Morteza
Hussain, Farookh Khadeer
Prior, Daniel
[J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (06) : 4493 - 4502
[5] Deep Reinforcement Learning-Based Method of Mobile Data Offloading
Mochizuki, Daisuke
Abiko, Yu
Mineno, Hiroshi
Saito, Takato
Ikeda, Daizo
Katagiri, Masaji
[J]. 2018 ELEVENTH INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND UBIQUITOUS NETWORK (ICMU 2018), 2018,
[6] Deep reinforcement learning-based optimization strategy for the cooperative scheduling of harvesters
Li, Zikang
Zhang, Fan
Teng, Guifa
Li, Zheng
Wang, Ziyi
Ma, Shiji
[J]. Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 40 (14): : 23 - 32
[7] Deep Reinforcement Learning-based Music Recommendation with Knowledge Graph Using Acoustic Features
Sakurai, Keigo
Togo, Ren
Ogawa, Takahiro
Haseyama, Miki
[J]. ITE TRANSACTIONS ON MEDIA TECHNOLOGY AND APPLICATIONS, 2022, 10 (01): : 8 - 17
[8] Review of Deep Learning-Based Personalized Learning Recommendation
Zhong, Ling
Wei, Yantao
Yao, Huang
Deng, Wei
Wang, Zhifeng
Tong, Mingwen
[J]. 2020 11TH INTERNATIONAL CONFERENCE ON E-EDUCATION, E-BUSINESS, E-MANAGEMENT, AND E-LEARNING (IC4E 2020), 2020, : 145 - 149
[9] Reinforcement learning-based denoising network for sequential recommendation
Xiaohai Tong
Pengfei Wang
Shaozhang Niu
[J]. Applied Intelligence, 2023, 53 : 1324 - 1335
[10] Reinforcement learning-based denoising network for sequential recommendation
Tong, Xiaohai
Wang, Pengfei
Niu, Shaozhang
[J]. APPLIED INTELLIGENCE, 2023, 53 (02) : 1324 - 1335

← 1 2 3 4 5 →